ScaDaMaLe Course site and book

This notebook series was updated from the previous one sds-2-x-dl. The notebook series was updated on 2022-01-17. See changes from previous version in table below as well as current flaws that needs revision.

Thanks to ...

Oskar testing github actions... removing tuesday...

Table of changes and current flaws

NotebookChanges
031cmd02: updated instructions
031cmd09: added contains in predicates
031cmd13, cm16: deleted as it was the same as cmd12, cm14
031cmd21: added markdown mentioning the run time for the two methods
031cmd64: added markdown with shapefile info
031cmd71: deleted, redundant
031acmd11: deleted, it was commented and not neccessary
031acmd13: added comments
031acmd20: added comments
031acmd24: deleted redundant comments
031acmd25: deleted redundant comments
031acmd28-cmd36: added exercise solution (tiny area center of Stockholm)
032cmd24-28: downloaded the 2017 data and created the schema. The schema uses pick-up and drop-off id instead of coordinates, so couldn't procees with magellan's points .
032acmd9 added markdown for disk usage
032adeteleted previous commands were cmd35-36 as they respective notebook is missing
032dcmd2: link not working (to be fixed by Raaz?)
032dcmd5: added markdown to explain how to load the data
032dcmd6-10: added cells to load the data
032dcmd11: added markdown with info about creating tables
032dcmd12-13: created mobile_sample table
032dcmd14: added markdown about missing data: from mobile_sample (DeviceMake, ClientId and Country columns) and no data about country codes
032dcmd15-24: not working because of missing data

Intro to GIS

Raazesh Sainudiin, Marina Toger

  • To illustrate the specifics of geospatial information, here we use QGIS software locally on your laptops.
  • Later Raaz will show scalable geospatial analysis using Magellan.

Birth of GIS: 1854 Cholera outbreak in London

Dr. John Snow father of modern epidemiology, GIS and spatial analysis, hypothesised that cholera was transmitted through the drinking of polluted water, rather than through the air, as was commonly believed, by mapping the cases.

displayHTML(frameIt("https://ds8.gitbooks.io/textbook/content/chapters/02/1/observation-and-visualization-john-snow-and-the-broad-street-pump.html",555))

GIS components

From: Longley, P. A., Goodchild, M. F., Maguire, D. J., & Rhind, D. W. (2005). Geographical information systems and science.

Applications of GIS

GIS software

Source: Kauri_Kiiman,2013

displayHTML(frameIt("https://en.wikipedia.org/wiki/List_of_geographic_information_systems_software",555))

The majority of governmental agencies and information providers, are still on proprietary mostly desktop GIS using legacy data formats like shapefiles. Researchers in various fields, e.g. ecology, geography, regional science, often use Python, R and PostGIS SQL, combined with desktop GIS software for visualisation.

Some of the biggest players: * proprietary Desktop ArcGIS of ESRI (Windows OS) * free and open-source QGIS (large community of developers, cross-platform) * free R - a software environment for statistical computing and graphics, some people using R as a GIS

These are just a few out of many software pachages, platforms and tools.

Why Magellan? - scalable.

From Ram's slide 12 of Magellan FOSS4G Talk, Boston 2017 at slideshare

In the future we might add GeoMesa

Do we need one more geospatial analytics library?

From Ram's slide 4 of this Spark Summit East 2016 talk at slideshare:

  • Spatial Analytics at scale is challenging
    • Simplicity + Scalability = Hard
  • Ancient Data Formats
    • metadata, indexing not handled well, inefficient storage
  • Geospatial Analytics is not simply Business Intelligence anymore
    • Statistical + Machine Learning being leveraged in geospatial
  • Now is the time to do it!
    • Explosion of mobile data
    • Finer granularity of data collection for geometries
    • Analytics stretching the limits of traditional approaches
    • Spark SQL + Catalyst + Tungsten makes extensible SQL engines easier than ever before!

Crash course in GIS using QGIS software locally on your laptops.

I learned from Ujaval Gandhi's excellent albeit outdated QGIS tutorials as well as a lot of playing around. Here I use some of his materials but if you are really interested in geospatial data, I suggest you follow his original QGIS tutorials for a fast dive into the GIS world. You can also learn using A Gentle Introduction to GIS and the Training Materials from the QGIS docs and more.

About

From qgis.org: QGIS is a user friendly Open Source Geographic Information System (GIS) licensed under the GNU Public License (GPL) Version 2 or above. QGIS is an official project of the Open Source Geospatial Foundation (OSGeo). It runs on Linux, Unix, Mac OSX, Windows and Android and supports numerous vector, raster, and database formats and functionalities.

Here we shall use the QGIS 3.0.2-Girona , fresh out of the oven current version (to date 2018-04-29) released 2018-04-20 based on Python 3.6.

1. Setting up

Installation pains

Basically go to QGIS download page, and follow instruction for your OS. We add here step-by-step tutorials for OS versions that we tried which are correct for today as we tried them. QGIS has a vibrant community of contributors so this will get outdated fast.

displayHTML(frameIt("https://qgis.org/en/site/forusers/download.html", 444))

MAC

The following worked for me on OSX Yosemite 10.10.5 on a MBP from early 2011, using QGIS macOS Installer Version 3.0

0.Check if you have Python 3.6 + and install if not (this isn't in the bundle, only python.org Python 3 is supported)

I already had it:

$ python3

Python 3.6.3 (v3.6.3:2c5fed86e0, Oct  3 2017, 00:32:08) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

If this step fails for some reason you can install QGIS 2.18 LTR which is based on Python 2.7 for today.

1.Install GDAL

2.Install QGIS

Windows

The following worked for me on Windows 10 Enterprise v.1709, using OSGeo4W Network Installer .

Let's get our hands dirty

Hello world in QGIS

Copy the following to your favourite text editor and save as a csv file:

id, lat, long, name
1, 59.839264, 17.647075, point1

Open QGIS3 and start a new project

Open the Open Data manager (click on or ⌘L)

Select Delimited Text > CSV, X and Y field, CRS > Add> Close

You have created a temporary layer containing a point.

GIS Data are stored in Vector or Raster Layers

You have created a temporary layer containing a point. To save it, right-click the layer > SaveAs

Let's have a look at the options we have:

Now you store the point in shapefile format as a file.

Most common basic vector data structures - ESRI Shapefiles

  • Points
  • Polygons
  • Polylines

Spatial data (invisible to the user in shapefile format) + attribute tables

displayHTML(frameIt("https://en.wikipedia.org/wiki/Shapefile", 444))

ls the folder containing the shapefile you just saved. There are 6 files created with the same name:

$ ls -S -lh | awk '{print $5, $9}'
 
257B point4326.qpj
147B point4326.dbf
143B point4326.prj
128B point4326.shp
108B point4326.shx
5B point4326.cpg

The original csv was 24B Point1.csv

to prep you for working with Magellan, we explore the basic Geometries and Predicates in QGIS

Geometries:

  • Point
  • LineString
  • Polygon
  • MultiPoint
  • MultiPolygon (treated as a collection of Polygons and read in as a row per polygon by the GeoJSON reader)

Predicates:

  • Intersects
  • Contains
  • Within

For more info look at the magellan README in github: https://github.com/harsha2010/magellan

Let's look at Magellan supported formats for geometry

The library currently supports reading the following formats:

ESRI Shapefiles - 788 bytes

Open source ancient format, but widely used, most data sources are in shapefiles

Our point information is contained in 6 files (!), not all are required (created by default using QGIS)

.prj is the projection file. Open it with a text editor and have a look:

GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]

Our default projection is WGS_1984. More on this later...

let's save our point in other formats

WKT of our point - 76 bytes

WKT;y;x
"POINT (17.647075 59.839264)";59.839264000000000;17.647075000000001

GeoJSON of our point - 305 bytes

{
"type": "FeatureCollection",
"name": "GJSpoint4326",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "y": 59.839264, "x": 17.647075 }, "geometry": { "type": "Point", "coordinates": [ 17.647075, 59.839264 ] } }
]
}

The numbers are coordinates: long , lat (Easting, Northing) in decimal degrees, like in Google Maps.

Datum, reference surface and projection

The objective is to project earth surface to Cartesian coordinates

Source: Nathan P. Belz 2012

Bottom line, whatever projection is selected, there is always a distortion.

HOMEWORK and recommended reading on projections: * https://kartoweb.itc.nl/geometrics/Introduction/introduction.html * https://kartoweb.itc.nl/geometrics/Reference%20surfaces/body.htm * See here for more on projections.

Geographic vs Projected Coordinate Systems

Image source: Jochen Albrecht

Geographic Coordinate Systems (GCS) - Location measured from curved surface of the earth - Measurement units latitude and longitude - Degrees-minutes-seconds (DMS) - Decimal degrees (DD) or radians (rad)

Projected Coordinate Systems (PCS) - Flat surface - Units can be in meters, feet, inches - Distortions will occur, except for very fine scale maps

Read more here and here

Spatial Reference System Identifier (SRID)

SRID are numeric codes for the spatial reference systems

List of SRID with their attributes: https://spatialreference.org/ref/epsg/3006/

Coordinate transformations

Source: geoXchange

Back to our point, let's compare WKT in two different CRS: * WGS84 (4326)

and * SWEREF99 (3006)

to do that, save the point in each of the formats we looked at (shapefile, csv-wkt, and geojson), but this time with a different CRS

note that the transformed point shapefiles are larger than the originals

$ pwd
.../3006

$ ls -S -lh | awk '{print $5, $9}'
 
570B point3006.qpj
379B point3006.prj
100B point3006.shp
100B point3006.shx
98B point3006.dbf
5B point3006.cpg

$ pwd
.../4326

$ ls -S -lh | awk '{print $5, $9}'
 
257B point4326.qpj
147B point4326.dbf
143B point4326.prj
128B point4326.shp
108B point4326.shx
5B point4326.cpg
Name WGS84 SWEREF99
EPSG 4326 3006
Units Degrees Metres
WKT POINT (17.647075 59.839264) POINT (648337.212857818 6636474.10921653)
WKT file size 76 bytes 128 bytes
geojson file size 305 bytes 353 bytes
shapefiles (together) 788 bytes 1,252 bytes

Compare .prj files for

  • WGS84 GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]

  • SWEREF99 PROJCS["SWEREF99_TM",GEOGCS["GCS_SWEREF99",DATUM["D_SWEREF99",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",15],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["Meter",1]]

Look up the CRS you want, e.g. SWEREF99

Datum - reference points and the reference surface used to relate the coordinate system to the Earth, e.g. North American Data 1983 (NAD), or World Geodetic System 1984 (WGS84)

Data stored in GIS are always distorted, contain errors, and are only a represntation of the world in estimated postional realm (see Jere Folgert's video for more)

False northing is a linear value applied to the origin of the y coordinates. False easting and northing values are usually applied to ensure that all x and y values are positive. You can also use the false easting and northing parameters to reduce the range of the x or y coordinate values (more here and here).

Adding multiple points, create .csv :

long,lat
17.6480052,59.8393701
17.6480341,59.8392894
17.6481147,59.8392956
17.6481432,59.8392159
17.6472424,59.8391136
17.6472631,59.8390557
17.6473433,59.8390544
17.6473458,59.8390704
17.6475238,59.8390772
17.647536,59.8389047
17.6474709,59.8389008
17.6474767,59.8388648
17.6473751,59.8388618
17.6473811,59.8387833
17.6484274,59.8388934
17.6484565,59.8388052
17.648535,59.8388155
17.6485727,59.838736
17.647413,59.8386109
17.6474341,59.8385565
17.6475478,59.8385595
17.6475474,59.8385703
17.6478522,59.8385804
17.6478664,59.8383792
17.6475097,59.8383669
17.6475481,59.8382687
17.6486244,59.8383864
17.6486543,59.8383085
17.6487378,59.8383203
17.6487746,59.8382488
17.6476091,59.8381245
17.6476395,59.8380363
17.6480262,59.8380806
17.6481183,59.8378764
17.647737,59.8378336
17.6477676,59.8377667
17.6486223,59.837865
17.6486551,59.8377838
17.6487355,59.8377901
17.6487695,59.8377133
17.6478238,59.8376243
17.6478937,59.83747
17.6479301,59.8374764
17.6479606,59.83739
17.6477726,59.8373749
17.6477583,59.8374074
17.6476953,59.8374007
17.6476289,59.8375505
17.647422,59.8380329
17.6473997,59.8380848
17.6473907,59.8381057
17.6462149,59.8379803
17.6461825,59.8380509
17.6462697,59.8380615
17.6462406,59.8381342
17.6473376,59.8382476
17.6471772,59.838613
17.6466979,59.8385603
17.6467378,59.838468
17.6465739,59.8383313
17.6461422,59.8382847
17.6460358,59.8385507
17.6461231,59.8385563
17.6460883,59.8386495
17.6471381,59.838757
17.6471013,59.8388454
17.6460051,59.838735
17.6459241,59.83898
17.6458227,59.8389721
17.645788,59.8390506
17.6458626,59.8390585
17.6458292,59.8391408
17.6460294,59.8391601
17.6468978,59.8392528
17.6469334,59.8392565
17.6473675,59.8393028
17.6479529,59.839365
17.6480052,59.8393701

Open the csv (lat is y, long is x). Save as a shapefile in 3006.

You can create additional columns. We shall do this using GUI, but if you set up Postgres, you can use SQL queries.

Right-click the layer > attribute table > open field calculator > create fields for x and y

This is how the WKT looks:

WKT,long,lat,x,y
"POINT (648388.848931084 6636488.00213679)",17.648005200000000,59.839370099999996,648389,6636488
"POINT (648390.827050854 6636479.0844731)",17.648034100000000,59.839289399999998,648391,6636479
"POINT (648395.314532686 6636479.95512511)",17.648114700000001,59.839295600000000,648395,6636480
"POINT (648397.265812052 6636471.14787468)",17.648143200000000,59.839215899999999,648397,6636471
"POINT (648347.259564427 6636457.74363919)",17.647242400000000,59.839113599999997,648347,6636458
"POINT (648348.67678627 6636451.34537085)",17.647263100000000,59.839055700000003,648349,6636451

The x, y coordinates are in meters.

Polygons

let's create a polygon in QGIS, make myPolygon.csv:

WKT,gid
"MULTIPOLYGON (((17.6453 59.8395,17.649 59.8395,17.649 59.8373,17.6453 59.8373,17.6453 59.8395)))",111

This is geojson of the same polygon:

{
"type": "FeatureCollection",
"name": "myPolygon3006",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:EPSG::3006" } },
"features": [
{ "type": "Feature", "properties": { "gid": 111 }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ 648236.730797635158524, 6636496.404087456874549 ], [ 648443.997444280423224, 6636504.689628538675606 ], [ 648453.793062769225799, 6636259.816340153105557 ], [ 648246.51272902963683, 6636251.530436812900007 ], [ 648236.730797635158524, 6636496.404087456874549 ] ] ] ] } }
]
}

Note how the first and last point coordinates are the same.

Polygons in Magellan

source: Ram's presentation

To add a Polyline create myPolyline.csv:

WKT,full_id
"MULTILINESTRING ((17.6453 59.8395,17.649 59.8395,17.649 59.8373,17.6453 59.8373))",6

Geojson of the polyline reprojected to SWEREF99:

{
"type": "FeatureCollection",
"name": "theLine6662_3006",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:EPSG::3006" } },
"features": [
{ "type": "Feature", "properties": { "full_id": 6 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 648236.730797635158524, 6636496.404087456874549 ], [ 648443.997444280423224, 6636504.689628538675606 ], [ 648453.793062769225799, 6636259.816340153105557 ], [ 648246.51272902963683, 6636251.530436812900007 ] ] ] } }
]
}

Creating geometry

Drawing geometry

The order: * add new layer (polygon, 3006) * turn on editing for the new layer * add new polygon * click to draw * right-click to finalise and fill in the attributes

Buffer

Create new geometry offset by distance of 13 m

Bounding Box

Create new geometry from BB of the buffer layer

For points geometry this is done using "minimum bounding geometry".

There are plenty of such functions, here mentioned are commonly used for spatial analysis. Another useful one:

Voronoi polygons

To try yourself: create a shapefile in 3006 of Voronoi polygons (select some buffer distance) and add columns with area, perimeter, and autoincremented gid

Check out basic statistics and histogram for area field

Simple queries

Select by attribute

The Mean value of the area was ≈ 749.8. Select only the polygons with area larger than the mean:

Save as a separate shapefile (same as usual but check "selected only")

Predicates

Understanding spatial joins

From Boundlessgeo put very nicely: 11. Spatial Relationships

From ESRI: IRelationalOperator Interface where logic is based on the geom elements:

In Magellan

Intersects

Intersects returns t (TRUE) if the intersection does not result in an empty set. Intersects returns the exact opposite result of disjoint.

Within

Within returns t (TRUE) if the first geometry is completely within the second geometry. Within tests for the exact opposite result of contains.

Contains

Contains returns t (TRUE) if the second geometry is completely contained by the first geometry. The contains predicate returns the exact opposite result of the within predicate.

Not in Magellan, from ESRI

Equal

Disjoint

Touch

Overlap

Cross

Source: ESRI Understanding spatial relations

Within

Equivalent PostgreSQL query:

SELECT * 
FROM points3006try AS a 
  INNER JOIN myPolygon2 AS b
    ON st_within(a.geom, b.geom)

Contains

Equivalent PostgreSQL query:

SELECT * 
FROM largerVoronoi AS a 
  INNER JOIN extractedRedPoints AS b
    ON st_contains(b.geom, a.geom)

Intersects

Equivalent PostgreSQL query:

SELECT * 
FROM largerVoronoi AS a 
  INNER JOIN myPolygon2 AS b
    ON st_intersects(a.geom, b.geom)

Even though this says INNER JOIN basically a Cartesian join is performed first and then the undesired results are filtered out.

displayHTML(frameIt("https://magellan.ghost.io/how-does-magellan-scale-geospatial-queries", 550))

Besides the predicates

GIS traditionally include two types of spatial joins, e.g.: * IRelationalOperator Interface: where the result is boolean (e.g. yes intersects, or doesn't) and thus joined data (e.g. attributes of the polygon within which the points are) * ITopologicalOperator Interface: where the result is geometry (e.g. the intersection)

Joins of the second type include intersection, difference, union, etc. geometry ( more here ).

Overlay operations

Intersect

Clip

I'll show you some more live

Open Streets Maps (OSM)

For now we download as a shapefile and play with it in QGIS

Go to http://extract.bbbike.org/ zoom and select the desired area and extract

You should get an email from bbbike with the data:

Download and unzip in your local folder

displayHTML(frameIt("https://wiki.openstreetmap.org/wiki/Map_Features", 550))

Open the building layer in QGIS

Change the projection to 3006. Let us explore the OSM data

OSM contains more data than what we got from bbbike. We can display OSM raster tiles (if the connection works, you should be able to do this locally)

To work with OSM properly in QGIS you need plugins. I can try to show you pending success in installing plugins. Try on your own:

Some advanced operations: distance matrix, nearest neighbour, points-in-polygon, mean coordinates, ...

Count points in polygon

To download .osm format in Magellan do this later

1.We define an area of interest and find coordinates of its boundary, AKA "bounding box". To do this go to https://www.openstreetmap.org and zoom roughly into the desired area.

2.To ingest data from OSM we use wget, in the following format:

wget -O MyFileName.osm "https://api.openstreetmap.org/api/0.6/map?bbox=l,b,r,t"

  • MyFileName.osm - give some informative file name

  • l = longitude of the LEFT boundary of the bounding box

  • b = lattitude of the BOTTOM boundary of the bounding box

  • r = longitude of the RIGHT boundary of the bounding box

  • t = lattitude of the TOP boundary of the bounding box

For instance if you know the bounding box, do:

  • TinyUppsalaCentrumWgot.osm - Tiny area in Uppsala Centrum

  • l = 17.63514

  • b = 59.85739

  • r = 17.64154

  • t = 59.86011

wget -O TinyUppsalaCentrumWgot.osm "https://api.openstreetmap.org/api/0.6/map?bbox=17.63514,59.85739,17.64154,59.86011"

Check out the NYC Taxi Dataset in Magellan

This is a much larger dataset and we may need access to a larger cluster - unless we just analyse a smaller subset of the data (perhaps just a month of Taxi rides in NYC). We can understand the same concepts using a much smaller dataset of Uber rides in San Francisco. We will analyse this next.

The taxi data can be downloaded from here

Let's have a look at NY neigbourhoods dataset (right-click save or wget/curl): https://github.com/harsha2010/magellan/raw/master/examples/datasets/NYC-NEIGHBORHOODS/neighborhoods.geojson

Open it in QGIS and open attribute table

ScaDaMaLe Course site and book

Note for Spark 2.4.5

The current (2022-02-01) latest maven coordinates for Magellan do not work for Spark 2.4+ and unsupported yet for spark 3.0+:

Use the binary jar from NEW JAR TO BE UPLOAD!! https://github.com/lamastex/scalable-data-science/tree/master/custom-builds/jars/magellan/forks on Databricks Runtime 6.6, Apache Spark 2.4.5, Scala 2.11 cluster.

Instructions

  1. Download NEW JAR TO BE UPLOAD!! https://github.com/lamastex/scalable-data-science/raw/master/custom-builds/jars/magellan/forks/magellan_2.11-1.0.7-SNAPSHOT.jar to your loacl machine.
  2. In Databricks choose Create -> Library and upload the packaged jar.
  3. Create a spark 2.4.5 Scala 2.11 cluster with the uploaded Magellan library installed or if you are already running a cluster and installed the uploaded library to it you have to detach and re-attach any notebook currently using that cluster.

NOTE: The magellan library's usual maven coordinates harsha2010:magellan:1.0.6-s_2.11 may be outdated, but it is here for your future reference. You can follow instructions here to assemble the master jar if needed: * https://github.com/lamastex/scalable-data-science/raw/master/custom-builds/jars/magellan/master

What is Geospatial Analytics?

(watch 3 minutes and 23 seconds: 111-314 seconds):

Spark Summit East 2016 - What is Geospatial Analytics by Ram Sri Harsha

Some Concrete Examples of Scalable Geospatial Analytics

Let us check out cross-domain data fusion in MSR's Urban Computing Group

Several sciences are naturally geospatial

  • forestry,
  • geography,
  • geology,
  • seismology,
  • ecology,
  • etc. etc.

See for example the global EQ datastreams from US geological Service below.

For a global data source, see US geological Service's Earthquake hazards Program "http://earthquake.usgs.gov/data/.

REDO

https://magellan.ghost.io/how-does-magellan-scale-geospatial-queries/

Introduction to Magellan for Scalable Geospatial Analytics

This is a minor augmentation of Ram Harsha's Magellan code blogged here: * magellan geospatial analytics in spark

def frameIt( u:String, h:Int ) : String = {
      """<iframe 
 src=""""+ u+""""
 width="95%" height="""" + h + """"
 sandbox>
  <p>
    <a href="http://spark.apache.org/docs/latest/index.html">
      Fallback link for browsers that, unlikely, don't support frames
    </a>
  </p>
</iframe>"""
   }
displayHTML(frameIt("https://magellan.ghost.io/how-does-magellan-scale-geospatial-queries/", 550))

Do we need one more geospatial analytics library?

From Ram's slide 4 of this Spark Summit East 2016 talk at slideshare:

  • Spatial Analytics at scale is challenging
    • Simplicity + Scalability = Hard
  • Ancient Data Formats
    • metadata, indexing not handled well, inefficient storage
  • Geospatial Analytics is not simply Business Intelligence anymore
    • Statistical + Machine Learning being leveraged in geospatial
  • Now is the time to do it!
    • Explosion of mobile data
    • Finer granularity of data collection for geometries
    • Analytics stretching the limits of traditional approaches
    • Spark SQL + Catalyst + Tungsten makes extensible SQL engines easier than ever before!

Let's get our hands dirty with basics in magellan.

Spatial Data Structures

  • Points
  • Polygons
  • lines
  • Polylines

Users' View of Spatial Data Structures (details are typically "invisible" to user)

Predicates

  • within
  • intersects
  • contains
// create a points DataFrame
val points = sc.parallelize(Seq((-1.0, -1.0), (-1.0, 1.0), (1.0, -1.0))).toDF("x", "y")
points: org.apache.spark.sql.DataFrame = [x: double, y: double]
// transform (lat,lon) into Point using custom user-defined function
import magellan.Point // just Point
import org.apache.spark.sql.functions.udf
val toPointUDF = udf{(x:Double,y:Double) => Point(x,y) }
import magellan.Point
import org.apache.spark.sql.functions.udf
toPointUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,org.apache.spark.sql.types.PointUDT@37548d6,Some(List(DoubleType, DoubleType)))
// let's show the results of the DF with a new column called point
points.withColumn("point", toPointUDF($"x", $"y")).show()
+----+----+-----------------+
|   x|   y|            point|
+----+----+-----------------+
|-1.0|-1.0|Point(-1.0, -1.0)|
|-1.0| 1.0| Point(-1.0, 1.0)|
| 1.0|-1.0| Point(1.0, -1.0)|
+----+----+-----------------+
points.show
+----+----+
|   x|   y|
+----+----+
|-1.0|-1.0|
|-1.0| 1.0|
| 1.0|-1.0|
+----+----+
// Let's instead use the built-in expression to do the same - it's much faster on larger DataFrames due to code-gen
import org.apache.spark.sql.magellan.dsl.expressions._
val points = sc.parallelize(Seq((-1.0, -1.0), (-1.0, 1.0), (1.0, -1.0))).toDF("x", "y").select(point($"x", $"y").as("point"))

points.show()
+-----------------+
|            point|
+-----------------+
|Point(-1.0, -1.0)|
| Point(-1.0, 1.0)|
| Point(1.0, -1.0)|
+-----------------+

import org.apache.spark.sql.magellan.dsl.expressions._
points: org.apache.spark.sql.DataFrame = [point: point]
display(points) // busted in bleeding-edge magellan we need for computing
point
Point(-1.0, -1.0)
Point(-1.0, 1.0)
Point(1.0, -1.0)

The latest version of magellan seems to have issues with the databricks display function. We will ignore this convenience of display and continue with our analysis.

This is a databricks display of magellan points when it is working properly in Spark 2.2.

Let's verify empirically if it is indeed faster for larger DataFrames.

// to generate a sequence of pairs of random numbers we can do:
import util.Random.nextDouble
Seq.fill(10)((-1.0*nextDouble,+1.0*nextDouble))
import util.Random.nextDouble
res7: Seq[(Double, Double)] = List((-0.4443119444291961,0.4405777408068594), (-0.3157738948550728,0.5017025352914497), (-0.9874637771136913,0.8519623151075828), (-0.9985464893592637,0.4278396438594432), (-0.3159292114428177,0.030487965422646535), (-0.7513798362079374,0.38194689908898793), (-0.507758712332592,0.7369770528847904), (-0.6697906990106479,0.6636420894550961), (-0.12535584996134563,0.6249808031956755), (-0.6102666766697349,0.6205652158691838))
// using the UDF method with 1 million points we can do a count action of the DF with point column
// don't add too many zeros as it may crash your driver program
sc.parallelize(Seq.fill(100000)((-1.0*nextDouble,+1.0*nextDouble)))
  .toDF("x", "y")
  .withColumn("point", toPointUDF('x, 'y))
  .count()
res8: Long = 100000
// it should be twice as fast with code-gen especially when we are ingesting from dbfs as opposed to 
// using Seq.fill in the driver...
sc.parallelize(Seq.fill(100000)((-1.0*nextDouble,+1.0*nextDouble)))
  .toDF("x", "y")
  .withColumn("point", point('x, 'y))
  .count()
res9: Long = 100000

Creating 100.000 points by using the udf method takes 3.12 seconds, while using the magellan's build in point method takes 1.35 seconds.

See https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html

// Create a Polygon DataFrame
import magellan.Polygon

case class PolygonExample(polygon: Polygon)

// do this in your head / pencil-paper / black-board going counter-clockwise
val ring = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0), Point(1.0, 1.0))
val polygon = Polygon(Array(0), ring)

val polygons = sc.parallelize(Seq(
  PolygonExample(Polygon(Array(0), ring))
)).toDF()
import magellan.Polygon
defined class PolygonExample
ring: Array[magellan.Point] = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0), Point(1.0, 1.0))
polygon: magellan.Polygon = magellan.Polygon@427f1ce6
polygons: org.apache.spark.sql.DataFrame = [polygon: polygon]
polygons.show(false)
+-------------------------+
|polygon                  |
+-------------------------+
|magellan.Polygon@fc63bb26|
+-------------------------+
display(polygons) // not much can be seen as its in the object
polygon
magellan.Polygon@63336515

This is a databricks display of magellan polygon when it is working properly in Spark 2.2 on another databricks run-time.

import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
// join points with polygons upon intersection
points.join(polygons)
      .where($"point" intersects $"polygon")
      .count() 
res13: Long = 3
points.show()
+-----------------+
|            point|
+-----------------+
|Point(-1.0, -1.0)|
| Point(-1.0, 1.0)|
| Point(1.0, -1.0)|
+-----------------+

Pop Quiz:

What are the three points intersect the polygon?

More generally we can have more complex queries as the generic polygon need not even be a convex set.

This is not an uncommon polygon - think of shapes of parks or lakes on a map.

A bounding box for a non-covex polygon

Let us consider our simple points and polygons we just made and consider the following points within polygon join query.

// join points with polygons upon within or containment
points.join(polygons)
      .where($"point" within $"polygon")
      .count()
res17: Long = 0
//creating line from two points
import magellan.Line

case class LineExample(line: Line)

val line = Line(Point(1.0, 1.0), Point(1.0, -1.0))

val lines = sc.parallelize(Seq(
      LineExample(line)
    )).toDF()

lines.show(false)
+---------------------------------------+
|line                                   |
+---------------------------------------+
|Line(Point(1.0, 1.0), Point(1.0, -1.0))|
+---------------------------------------+

import magellan.Line
defined class LineExample
line: magellan.Line = Line(Point(1.0, 1.0), Point(1.0, -1.0))
lines: org.apache.spark.sql.DataFrame = [line: line]
display(lines)
line
Line(Point(1.0, 1.0), Point(1.0, -1.0))

This is a databricks display of magellan lines when it is working properly!

// creating polyline
import magellan.PolyLine

case class PolyLineExample(polyline: PolyLine)

val ring = Array(Point(1.0, 1.0), Point(1.0, -1.0),
      Point(-1.0, -1.0), Point(-1.0, 1.0))

val polylines = sc.parallelize(Seq(
      PolyLineExample(PolyLine(Array(0), ring))
    )).toDF()
import magellan.PolyLine
defined class PolyLineExample
ring: Array[magellan.Point] = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0))
polylines: org.apache.spark.sql.DataFrame = [polyline: polyline]
polylines.show(false)
+--------------------------+
|polyline                  |
+--------------------------+
|magellan.PolyLine@6cc77052|
+--------------------------+

This is a databricks display of magellan polyline when it is working properly!

// now let's make a polyline with two or more lines out of the same ring
val polylines2 = sc.parallelize(Seq(
  PolyLineExample(PolyLine(Array(0,2), ring)) // first line starts at index 0 and second one starts at index 2
)).toDF()

polylines2.show(false)
+--------------------------+
|polyline                  |
+--------------------------+
|magellan.PolyLine@43efee0d|
+--------------------------+

polylines2: org.apache.spark.sql.DataFrame = [polyline: polyline]
import magellan.Point

val p = Point(1.0, -1.0)
import magellan.Point
p: magellan.Point = Point(1.0, -1.0)
//p. // uncomment line and put the cursor next to the . and hit TAB to see available methods on the magellan Point p
(p.getX, p.getY) // for example we can getX and getY values of the Point p
res26: (Double, Double) = (1.0,-1.0)
val pc = Point(0.0,0.0)
p.withinCircle(pc, 5.0) // check if Point p iswith circle of radius 5.0 around Point pc
pc: magellan.Point = Point(0.0, 0.0)
res27: Boolean = true
p.boundingBox // find the bounding box of p
res28: magellan.BoundingBox = BoundingBox(1.0,-1.0,1.0,-1.0)
import magellan.Point

// create a radius 0.5 buffered polygon about the centre given by Point(0.0, 1.0)
val aBufferedPolygon = Point(0.0, 1.0).buffer(0.5) 


magellan.esri.ESRIUtil.toESRIGeometry(aBufferedPolygon)

println(aBufferedPolygon)
magellan.Polygon@9f249027
import magellan.Point
aBufferedPolygon: magellan.Polygon = magellan.Polygon@9f249027

Dive here for more on magellan Point:

  • https://github.com/harsha2010/magellan/blob/master/src/main/scala/magellan/Point.scala

Knock yourself out on other Data Structures in the source.

Uber Trajectories in San Francisco

Dataset for the Demo done by Ram Sri Harsha in Europe Spark Summit 2015

First the datasets have to be loaded into distributed file store.

  • See Step 0: Downloading datasets and loading into dbfs below for doing this anew (This only needs to be done once if the data is persisted in the distributed file system).

After downloading the data, we expect to have the following files in distributed file system (dbfs):

  • all.tsv is the file of all uber trajectories
  • SFNbhd is the directory containing SF neighborhood shape files.
// display the contents of the dbfs directory "dbfs:/datasets/magellan/"
// - if you don't see files here then go to Step 0 below as explained above!
display(dbutils.fs.ls("dbfs:/datasets/magellan/")) 
path name size
dbfs:/datasets/magellan/SFNbhd/ SFNbhd/ 0.0
dbfs:/datasets/magellan/all.tsv all.tsv 6.0947802e7
ls /dbfs/datasets
alexandria
beijing
magellan
maps
mobile_sample
osm
sou
t-drive-trips
t-drive-trips-magellan
taxis

First five lines or rows of the uber data containing: tripID, timestamp, Lon, Lat

sc.textFile("dbfs:/datasets/magellan/all.tsv").take(5).foreach(println)
00001	2007-01-07T10:54:50+00:00	37.782551	-122.445368
00001	2007-01-07T10:54:54+00:00	37.782745	-122.444586
00001	2007-01-07T10:54:58+00:00	37.782842	-122.443688
00001	2007-01-07T10:55:02+00:00	37.782919	-122.442815
00001	2007-01-07T10:55:06+00:00	37.782992	-122.442112

The neighborhood shape files for Sanfrancisco will form the polygons of interest to us.

The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

display(dbutils.fs.ls("dbfs:/datasets/magellan/SFNbhd")) // legacy shape files - used in various sectors
path name size
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.dbf planning_neighborhoods.dbf 1028.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.prj planning_neighborhoods.prj 567.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbn planning_neighborhoods.sbn 516.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbx planning_neighborhoods.sbx 164.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp planning_neighborhoods.shp 214576.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp.xml planning_neighborhoods.shp.xml 21958.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shx planning_neighborhoods.shx 396.0

Homework

First watch the more technical magellan presentation by Ram Sri Harsha (Hortonworks) in Spark Summit Europe 2015

![Ram Sri Harsha's Magellan Spark Summit EU 2015 Talk]](https://www.youtube.com/watch?v=rP8H-xQTuM0)

Let's repeat Ram's original analysis from the following blog as done below.

Ram's blog in HortonWorks.

This is just to get you started... You may need to moidfy this!

case class UberRecord(tripId: String, timestamp: String, point: Point) // a case class for UberRecord 
defined class UberRecord
val uber = sc.textFile("dbfs:/datasets/magellan/all.tsv")
              .map { line =>
                      val parts = line.split("\t" )
                      val tripId = parts(0)
                      val timestamp = parts(1)
                      val point = Point(parts(3).toDouble, parts(2).toDouble)
                      UberRecord(tripId, timestamp, point)
                    }
                     //.repartition(100) // using default repartition
                     .toDF()
                     .cache()
uber: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 1 more field]
val uberRecordCount = uber.count() // how many Uber records?
uberRecordCount: Long = 1128663

So there are over a million UberRecords.

sqlContext.read.format("magellan").load("dbfs:/datasets/magellan/SFNbhd/").printSchema()
root
 |-- point: point (nullable = true)
 |-- polyline: polyline (nullable = true)
 |-- polygon: polygon (nullable = true)
 |-- metadata: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)
 |-- valid: boolean (nullable = true)
val neighborhoods = sqlContext.read.format("magellan") 
                                   .load("dbfs:/datasets/magellan/SFNbhd/")
                                   .select($"polygon", $"metadata")
                                   .cache()
neighborhoods: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [polygon: polygon, metadata: map<string,string>]
neighborhoods.count() // how many neighbourhoods in SF?
res36: Long = 37
neighborhoods.printSchema
root
 |-- polygon: polygon (nullable = true)
 |-- metadata: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)
neighborhoods.show(2,false) // see the first two neighbourhoods
+-------------------------+-----------------------------------------+
|polygon                  |metadata                                 |
+-------------------------+-----------------------------------------+
|magellan.Polygon@5e8b7382|[neighborho -> Twin Peaks               ]|
|magellan.Polygon@aefbe87e|[neighborho -> Pacific Heights          ]|
+-------------------------+-----------------------------------------+
only showing top 2 rows

You Try:

Modify the next cell to see all 37 neighborhoods.

neighborhoods.show(37,false) // modify this cell to see all 37 neighborhoods
+-------------------------+-----------------------------------------+
|polygon                  |metadata                                 |
+-------------------------+-----------------------------------------+
|magellan.Polygon@9a519148|[neighborho -> Twin Peaks               ]|
|magellan.Polygon@2d5e862b|[neighborho -> Pacific Heights          ]|
|magellan.Polygon@eafc4a01|[neighborho -> Visitacion Valley        ]|
|magellan.Polygon@b87b053f|[neighborho -> Potrero Hill             ]|
|magellan.Polygon@a90162d5|[neighborho -> Crocker Amazon           ]|
|magellan.Polygon@bb49ff9c|[neighborho -> Outer Mission            ]|
|magellan.Polygon@fb06b113|[neighborho -> Bayview                  ]|
|magellan.Polygon@bafd0911|[neighborho -> Lakeshore                ]|
|magellan.Polygon@ad89232d|[neighborho -> Russian Hill             ]|
|magellan.Polygon@b3c46f20|[neighborho -> Golden Gate Park         ]|
|magellan.Polygon@5ff06533|[neighborho -> Outer Sunset             ]|
|magellan.Polygon@fa2cc9b5|[neighborho -> Inner Sunset             ]|
|magellan.Polygon@6beaa40b|[neighborho -> Excelsior                ]|
|magellan.Polygon@2befcdb6|[neighborho -> Outer Richmond           ]|
|magellan.Polygon@7f2f3423|[neighborho -> Parkside                 ]|
|magellan.Polygon@16cde909|[neighborho -> Bernal Heights           ]|
|magellan.Polygon@dd6fd499|[neighborho -> Noe Valley               ]|
|magellan.Polygon@965ebd1c|[neighborho -> Presidio                 ]|
|magellan.Polygon@6e73c0c2|[neighborho -> Nob Hill                 ]|
|magellan.Polygon@686b88b |[neighborho -> Financial District       ]|
|magellan.Polygon@a0d10f1b|[neighborho -> Glen Park                ]|
|magellan.Polygon@335cadb5|[neighborho -> Marina                   ]|
|magellan.Polygon@4eac537f|[neighborho -> Seacliff                 ]|
|magellan.Polygon@5b75bfd9|[neighborho -> Mission                  ]|
|magellan.Polygon@5e99ea57|[neighborho -> Downtown/Civic Center    ]|
|magellan.Polygon@d22d0489|[neighborho -> South of Market          ]|
|magellan.Polygon@6aaf808d|[neighborho -> Presidio Heights         ]|
|magellan.Polygon@5f470ef3|[neighborho -> Inner Richmond           ]|
|magellan.Polygon@7dba9eb4|[neighborho -> Castro/Upper Market      ]|
|magellan.Polygon@b9501895|[neighborho -> West of Twin Peaks       ]|
|magellan.Polygon@b213687c|[neighborho -> Ocean View               ]|
|magellan.Polygon@766d6fd4|[neighborho -> Treasure Island/YBI      ]|
|magellan.Polygon@48c45968|[neighborho -> Chinatown                ]|
|magellan.Polygon@d2c56329|[neighborho -> Western Addition         ]|
|magellan.Polygon@c92a684c|[neighborho -> North Beach              ]|
|magellan.Polygon@ce8caa28|[neighborho -> Diamond Heights          ]|
|magellan.Polygon@aac6c49d|[neighborho -> Haight Ashbury           ]|
+-------------------------+-----------------------------------------+
import org.apache.spark.sql.functions._ // this is needed for sql functions like explode, etc.
import org.apache.spark.sql.functions._
//names of all 37 neighborhoods of San Francisco
neighborhoods.select(explode($"metadata").as(Seq("k", "v"))).show(37,false)
+----------+-------------------------+
|k         |v                        |
+----------+-------------------------+
|neighborho|Twin Peaks               |
|neighborho|Pacific Heights          |
|neighborho|Visitacion Valley        |
|neighborho|Potrero Hill             |
|neighborho|Crocker Amazon           |
|neighborho|Outer Mission            |
|neighborho|Bayview                  |
|neighborho|Lakeshore                |
|neighborho|Russian Hill             |
|neighborho|Golden Gate Park         |
|neighborho|Outer Sunset             |
|neighborho|Inner Sunset             |
|neighborho|Excelsior                |
|neighborho|Outer Richmond           |
|neighborho|Parkside                 |
|neighborho|Bernal Heights           |
|neighborho|Noe Valley               |
|neighborho|Presidio                 |
|neighborho|Nob Hill                 |
|neighborho|Financial District       |
|neighborho|Glen Park                |
|neighborho|Marina                   |
|neighborho|Seacliff                 |
|neighborho|Mission                  |
|neighborho|Downtown/Civic Center    |
|neighborho|South of Market          |
|neighborho|Presidio Heights         |
|neighborho|Inner Richmond           |
|neighborho|Castro/Upper Market      |
|neighborho|West of Twin Peaks       |
|neighborho|Ocean View               |
|neighborho|Treasure Island/YBI      |
|neighborho|Chinatown                |
|neighborho|Western Addition         |
|neighborho|North Beach              |
|neighborho|Diamond Heights          |
|neighborho|Haight Ashbury           |
+----------+-------------------------+

This join below yields nothing.

So what's going on?

Watch Ram's 2015 Spark Summit talk for details on geospatial formats and transformations.

neighborhoods
  .join(uber)
  .where($"point" within $"polygon")
  .select($"tripId", $"timestamp", explode($"metadata").as(Seq("k", "v")))
  .withColumnRenamed("v", "neighborhood")
  .drop("k")
  .show(5)
+------+---------+------------+
|tripId|timestamp|neighborhood|
+------+---------+------------+
+------+---------+------------+

Need the right transformer to transform the points into the right coordinate system of the shape files.

displayHTML(frameIt("https://en.wikipedia.org/wiki/North_American_Datum#North_American_Datum_of_1983",400))
// This code was removed from magellan in this commit:
// https://github.com/harsha2010/magellan/commit/8df0a62560116f8ed787fc7e86f190f8e2730826
// We bring this back to show how to roll our own transformations.
// EXERCISE: find existing transformers / methods in magellan or esri to go between coordinate systems 
import magellan.Point

class NAD83(params: Map[String, Any]) {
  val RAD = 180d / Math.PI
  val ER  = 6378137.toDouble  // semi-major axis for GRS-80
  val RF  = 298.257222101  // reciprocal flattening for GRS-80
  val F   = 1.toDouble / RF  // flattening for GRS-80
  val ESQ = F + F - (F * F)
  val E   = StrictMath.sqrt(ESQ)

  private val ZONES =  Map(
    401 -> Array(122.toDouble, 2000000.0001016,
      500000.0001016001, 40.0,
      41.66666666666667, 39.33333333333333),
    403 -> Array(120.5, 2000000.0001016,
      500000.0001016001, 37.06666666666667,
      38.43333333333333, 36.5)
  )

  def from() = {
    val zone = params("zone").asInstanceOf[Int]
    ZONES.get(zone) match {
      case Some(x) => if (x.length == 5) {
        toTransverseMercator(x)
      } else {
        toLambertConic(x)
      }
      case None => ???
    }
  }

  def to() = {
    val zone = params("zone").asInstanceOf[Int]
    ZONES.get(zone) match {
      case Some(x) => if (x.length == 5) {
        fromTransverseMercator(x)
      } else {
        fromLambertConic(x)
      }
      case None => ???
    }
  }

  def qqq(e: Double, s: Double) = {
    (StrictMath.log((1 + s) / (1 - s)) - e *
      StrictMath.log((1 + e * s) / (1 - e * s))) / 2
  }

  def toLambertConic(params: Array[Double]) = {
    val cm = params(0) / RAD  // CENTRAL MERIDIAN (CM)
    val eo = params(1)  // FALSE EASTING VALUE AT THE CM (METERS)
    val nb = params(2)  // FALSE NORTHING VALUE AT SOUTHERMOST PARALLEL (METERS), (USUALLY ZERO)
    val fis = params(3) / RAD  // LATITUDE OF SO. STD. PARALLEL
    val fin = params(4) / RAD  // LATITUDE OF NO. STD. PARALLEL
    val fib = params(5) / RAD // LATITUDE OF SOUTHERNMOST PARALLEL
    val sinfs = StrictMath.sin(fis)
    val cosfs = StrictMath.cos(fis)
    val sinfn = StrictMath.sin(fin)
    val cosfn = StrictMath.cos(fin)
    val sinfb = StrictMath.sin(fib)
    val qs = qqq(E, sinfs)
    val qn = qqq(E, sinfn)
    val qb = qqq(E, sinfb)
    val w1 = StrictMath.sqrt(1.toDouble - ESQ * sinfs * sinfs)
    val w2 = StrictMath.sqrt(1.toDouble - ESQ * sinfn * sinfn)
    val sinfo = StrictMath.log(w2 * cosfs / (w1 * cosfn)) / (qn - qs)
    val k = ER * cosfs * StrictMath.exp(qs * sinfo) / (w1 * sinfo)
    val rb = k / StrictMath.exp(qb * sinfo)

    (point: Point) => {
      val (long, lat) = (point.getX(), point.getY())
      val l = - long / RAD
      val f = lat / RAD
      val q = qqq(E, StrictMath.sin(f))
      val r = k / StrictMath.exp(q * sinfo)
      val gam = (cm - l) * sinfo
      val n = rb + nb - (r * StrictMath.cos(gam))
      val e = eo + (r * StrictMath.sin(gam))
      Point(e, n)
    }
  }

  def toTransverseMercator(params: Array[Double]) = {
    (point: Point) => {
      point
    }
  }

  def fromLambertConic(params: Array[Double]) = {
    val cm = params(0) / RAD  // CENTRAL MERIDIAN (CM)
    val eo = params(1)  // FALSE EASTING VALUE AT THE CM (METERS)
    val nb = params(2)  // FALSE NORTHING VALUE AT SOUTHERMOST PARALLEL (METERS), (USUALLY ZERO)
    val fis = params(3) / RAD  // LATITUDE OF SO. STD. PARALLEL
    val fin = params(4) / RAD  // LATITUDE OF NO. STD. PARALLEL
    val fib = params(5) / RAD // LATITUDE OF SOUTHERNMOST PARALLEL
    val sinfs = StrictMath.sin(fis)
    val cosfs = StrictMath.cos(fis)
    val sinfn = StrictMath.sin(fin)
    val cosfn = StrictMath.cos(fin)
    val sinfb = StrictMath.sin(fib)

    val qs = qqq(E, sinfs)
    val qn = qqq(E, sinfn)
    val qb = qqq(E, sinfb)
    val w1 = StrictMath.sqrt(1.toDouble - ESQ * sinfs * sinfs)
    val w2 = StrictMath.sqrt(1.toDouble - ESQ * sinfn * sinfn)
    val sinfo = StrictMath.log(w2 * cosfs / (w1 * cosfn)) / (qn - qs)
    val k = ER * cosfs * StrictMath.exp(qs * sinfo) / (w1 * sinfo)
    val rb = k / StrictMath.exp(qb * sinfo)
    (point: Point) => {
      val easting = point.getX()
      val northing = point.getY()
      val npr = rb - northing + nb
      val epr = easting - eo
      val gam = StrictMath.atan(epr / npr)
      val lon = cm - (gam / sinfo)
      val rpt = StrictMath.sqrt(npr * npr + epr * epr)
      val q = StrictMath.log(k / rpt) / sinfo
      val temp = StrictMath.exp(q + q)
      var sine = (temp - 1.toDouble) / (temp + 1.toDouble)
      var f1, f2 = 0.0
      for (i <- 0 until 2) {
        f1 = ((StrictMath.log((1.toDouble + sine) / (1.toDouble - sine)) - E *
          StrictMath.log((1.toDouble + E * sine) / (1.toDouble - E * sine))) / 2.toDouble) - q
        f2 = 1.toDouble / (1.toDouble - sine * sine) - ESQ / (1.toDouble - ESQ * sine * sine)
        sine -= (f1/ f2)
      }
      Point(StrictMath.toDegrees(lon) * -1, StrictMath.toDegrees(StrictMath.asin(sine)))
    }
  }

  def fromTransverseMercator(params: Array[Double]) = {
    val cm = params(0)  // CENTRAL MERIDIAN (CM)
    val fe = params(1)  // FALSE EASTING VALUE AT THE CM (METERS)
    val or = params(2) / RAD  // origin latitude
    val sf = 1.0 - (1.0 / params(3)) // scale factor
    val fn = params(4)  // false northing
    // translated from TCONPC subroutine
    val eps = ESQ / (1.0 - ESQ)
    val pr = (1.0 - F) * ER
    val en = (ER - pr) / (ER + pr)
    val en2 = en * en
    val en3 = en * en * en
    val en4 = en2 * en2

    var c2 = -3.0 * en / 2.0 + 9.0 * en3 / 16.0
    var c4 = 15.0d * en2 / 16.0d - 15.0d * en4 /32.0
    var c6 = -35.0 * en3 / 48.0
    var c8 = 315.0 * en4 / 512.0
    val u0 = 2.0 * (c2 - 2.0 * c4 + 3.0 * c6 - 4.0 * c8)
    val u2 = 8.0 * (c4 - 4.0 * c6 + 10.0 * c8)
    val u4 = 32.0 * (c6 - 6.0 * c8)
    val u6 = 129.0 * c8

    c2 = 3.0 * en / 2.0 - 27.0 * en3 / 32.0
    c4 = 21.0 * en2 / 16.0 - 55.0 * en4 / 32.0d
    c6 = 151.0 * en3 / 96.0
    c8 = 1097.0d * en4 / 512.0
    val v0 = 2.0 * (c2 - 2.0 * c4 + 3.0 * c6 - 4.0 * c8)
    val v2 = 8.0 * (c4 - 4.0 * c6 + 10.0 * c8)
    val v4 = 32.0 * (c6 - 6.0 * c8)
    val v6 = 128.0 * c8

    val r = ER * (1.0 - en) * (1.0 - en * en) * (1.0 + 2.25 * en * en + (225.0 / 64.0) * en4)
    val cosor = StrictMath.cos(or)
    val omo = or + StrictMath.sin(or) * cosor *
      (u0 + u2 * cosor * cosor + u4 * StrictMath.pow(cosor, 4) + u6 * StrictMath.pow(cosor, 6))
    val so = sf * r * omo

    (point: Point) => {
      val easting = point.getX()
      val northing = point.getY()
      // translated from TMGEOD subroutine
      val om = (northing - fn + so) / (r * sf)
      val cosom = StrictMath.cos(om)
      val foot = om + StrictMath.sin(om) * cosom *
        (v0 + v2 * cosom * cosom + v4 * StrictMath.pow(cosom, 4) + v6 * StrictMath.pow(cosom, 6))
      val sinf = StrictMath.sin(foot)
      val cosf = StrictMath.cos(foot)
      val tn = sinf / cosf
      val ts = tn * tn
      val ets = eps * cosf * cosf
      val rn = ER * sf / StrictMath.sqrt(1.0 - ESQ * sinf * sinf)
      val q = (easting - fe) / rn
      val qs = q * q
      val b2 = -tn * (1.0 + ets) / 2.0
      val b4 = -(5.0 + 3.0 * ts + ets * (1.0 - 9.0 * ts) - 4.0 * ets * ets) / 12.0
      val b6 = (61.0 + 45.0 * ts * (2.0 + ts) + ets * (46.0 - 252.0 * ts -60.0 * ts * ts)) / 360.0
      val b1 = 1.0
      val b3 = -(1.0 + ts + ts + ets) / 6.0
      val b5 = (5.0 + ts * (28.0 + 24.0 * ts) + ets * (6.0 + 8.0 * ts)) / 120.0
      val b7 = -(61.0 + 662.0 * ts + 1320.0 * ts * ts + 720.0 * StrictMath.pow(ts, 3)) / 5040.0
      val lat = foot + b2 * qs * (1.0 + qs * (b4 + b6 * qs))
      val l = b1 * q * (1.0 + qs * (b3 + qs * (b5 + b7 * qs)))
      val lon = -l / cosf + cm
      Point(StrictMath.toDegrees(lon) * -1, StrictMath.toDegrees(lat))
    }
  }
}
import magellan.Point
defined class NAD83
val transformer: Point => Point = (point: Point) => {
  val from = new NAD83(Map("zone" -> 403)).from()
  val p = point.transform(from)
  Point(3.28084 * p.getX, 3.28084 * p.getY)
}

// add a new column in nad83 coordinates
val uberTransformed = uber
                      .withColumn("nad83", $"point".transform(transformer))
                      .cache()
transformer: magellan.Point => magellan.Point = <function1>
uberTransformed: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 2 more fields]
uberTransformed.count()
res43: Long = 1128663
uberTransformed.show(5,false) // nad83 transformed points
+------+-------------------------+-----------------------------+---------------------------------------------+
|tripId|timestamp                |point                        |nad83                                        |
+------+-------------------------+-----------------------------+---------------------------------------------+
|00001 |2007-01-07T10:54:50+00:00|Point(-122.445368, 37.782551)|Point(5999523.477715266, 2113253.7290443885) |
|00001 |2007-01-07T10:54:54+00:00|Point(-122.444586, 37.782745)|Point(5999750.8888492435, 2113319.6570987953)|
|00001 |2007-01-07T10:54:58+00:00|Point(-122.443688, 37.782842)|Point(6000011.08106823, 2113349.5785887106)  |
|00001 |2007-01-07T10:55:02+00:00|Point(-122.442815, 37.782919)|Point(6000263.898268142, 2113372.3716762937) |
|00001 |2007-01-07T10:55:06+00:00|Point(-122.442112, 37.782992)|Point(6000467.566895697, 2113394.7303657546) |
+------+-------------------------+-----------------------------+---------------------------------------------+
only showing top 5 rows
uberTransformed.select("tripId").distinct().count() // number of unique tripIds
res45: Long = 24999

Let' try the join again after appropriate transformation of coordinate system.

val joined = neighborhoods
              .join(uberTransformed)
              .where($"nad83" within $"polygon")
              .select($"tripId", $"timestamp", explode($"metadata").as(Seq("k", "v")))
              .withColumnRenamed("v", "neighborhood")
              .drop("k")
              .cache()
joined: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 1 more field]
val UberRecordsInNbhdsCount = joined.count() // about 131 seconds for first action (doing broadcast hash join)
UberRecordsInNbhdsCount: Long = 1085087
joined.explain
== Physical Plan ==
InMemoryTableScan [tripId#469, timestamp#470, neighborhood#929]
   +- InMemoryRelation [tripId#469, timestamp#470, neighborhood#929], StorageLevel(disk, memory, deserialized, 1 replicas)
         +- *(1) Project [tripId#469, timestamp#470, v#924 AS neighborhood#929]
            +- *(1) Generate explode(metadata#580), [tripId#469, timestamp#470], false, [k#923, v#924]
               +- *(1) Project [metadata#580, tripId#469, timestamp#470]
                  +- *(1) BroadcastNestedLoopJoin BuildLeft, Inner, Within(nad83#745, polygon#579)
                     :- BroadcastExchange IdentityBroadcastMode, [id=#1719]
                     :  +- InMemoryTableScan [polygon#579, metadata#580]
                     :        +- InMemoryRelation [polygon#579, metadata#580], StorageLevel(disk, memory, deserialized, 1 replicas)
                     :              +- *(1) Scan ShapeFileRelation(dbfs:/datasets/magellan/SFNbhd/,Map(path -> dbfs:/datasets/magellan/SFNbhd/)) [polygon#579,metadata#580] PushedFilters: [], ReadSchema: struct<polygon:struct<type:int,xmin:double,ymin:double,xmax:double,ymax:double,indices:array<int>...
                     +- InMemoryTableScan [tripId#469, timestamp#470, nad83#745]
                           +- InMemoryRelation [tripId#469, timestamp#470, point#471, nad83#745], StorageLevel(disk, memory, deserialized, 1 replicas)
                                 +- *(1) Project [tripId#469, timestamp#470, point#471, transformer(point#471, <function1>) AS nad83#745]
                                    +- InMemoryTableScan [point#471, timestamp#470, tripId#469]
                                          +- InMemoryRelation [tripId#469, timestamp#470, point#471], StorageLevel(disk, memory, deserialized, 1 replicas)
                                                +- *(1) SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(input[0, line891d49738e2e4c728aab43b5afc9663a112.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$UberRecord, true]).tripId, true, false) AS tripId#469, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(input[0, line891d49738e2e4c728aab43b5afc9663a112.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$UberRecord, true]).timestamp, true, false) AS timestamp#470, newInstance(class org.apache.spark.sql.types.PointUDT).serialize AS point#471]
                                                   +- Scan[obj#468]
joined.show(5,false)
+------+-------------------------+-------------------------+
|tripId|timestamp                |neighborhood             |
+------+-------------------------+-------------------------+
|00001 |2007-01-07T10:54:50+00:00|Western Addition         |
|00001 |2007-01-07T10:54:54+00:00|Western Addition         |
|00001 |2007-01-07T10:54:58+00:00|Western Addition         |
|00001 |2007-01-07T10:55:02+00:00|Western Addition         |
|00001 |2007-01-07T10:55:06+00:00|Western Addition         |
+------+-------------------------+-------------------------+
only showing top 5 rows
uberRecordCount - UberRecordsInNbhdsCount // records not in the neighbouthood shape files
res49: Long = 43576
joined
  .groupBy($"neighborhood")
  .agg(countDistinct("tripId")
  .as("trips"))
  .orderBy(col("trips").desc)
  .show(5,false)
+-------------------------+-----+
|neighborhood             |trips|
+-------------------------+-----+
|South of Market          |9891 |
|Western Addition         |6794 |
|Downtown/Civic Center    |6697 |
|Financial District       |6038 |
|Mission                  |5620 |
+-------------------------+-----+
only showing top 5 rows

Other spatial Algorithms in Spark are being explored for generic and more efficient scalable geospatial analytic tasks

Read for more spatial indexing structures.

  • SpatialSpark aims to provide efficient spatial operations using Apache Spark.
    • Spatial Partition
      • Generate a spatial partition from input dataset, currently Fixed-Grid Partition (FGP), Binary-Split Partition (BSP) and Sort-Tile Partition (STP) are supported.
    • Spatial Range Query
      • includes both indexed and non-indexed query (useful for neighbourhood searches)
  • z-order Knn join
    • A space-filling curve trick to index multi-dimensional metric data into 1 Dimension. See: ieee paper and the slides.
  • AkNN = All K Nearest Neighbours - identify the k nearesy neighbours for all nodes simultaneously (cont AkNN is the streaming form of AkNN)
    • need to identify the right resources to do this scalably.
  • spark-knn-graphs: https://github.com/tdebatty/spark-knn-graphs *** ***

Step 0: Downloading datasets and load into dbfs

  • get the Uber data
  • get the San Francisco neighborhood data
ls
conf
derby.log
eventlogs
ganglia
logs
wget https://raw.githubusercontent.com/dima42/uber-gps-analysis/master/gpsdata/all.tsv
#wget http://lamastex.org/datasets/public/geospatial/uber/all.tsv
--2022-02-01 14:21:17--  https://raw.githubusercontent.com/dima42/uber-gps-analysis/master/gpsdata/all.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60947802 (58M) [text/plain]
Saving to: ‘all.tsv’

     0K .......... .......... .......... .......... ..........  0% 4.78M 12s
    50K .......... .......... .......... .......... ..........  0% 4.04M 13s
   100K .......... .......... .......... .......... ..........  0% 5.61M 12s
   150K .......... .......... .......... .......... ..........  0% 19.6M 10s
   200K .......... .......... .......... .......... ..........  0% 40.4M 8s
   250K .......... .......... .......... .......... ..........  0% 7.67M 8s
   300K .......... .......... .......... .......... ..........  0% 23.6M 7s
   350K .......... .......... .......... .......... ..........  0% 72.8M 6s
   400K .......... .......... .......... .......... ..........  0% 22.2M 6s
   450K .......... .......... .......... .......... ..........  0%  115M 5s
   500K .......... .......... .......... .......... ..........  0% 12.8M 5s
   550K .......... .......... .......... .......... ..........  1% 41.7M 5s
   600K .......... .......... .......... .......... ..........  1%  109M 5s
   650K .......... .......... .......... .......... ..........  1% 72.6M 4s
   700K .......... .......... .......... .......... ..........  1% 46.3M 4s
   750K .......... .......... .......... .......... ..........  1% 90.8M 4s
   800K .......... .......... .......... .......... ..........  1% 85.0M 4s
   850K .......... .......... .......... .......... ..........  1% 98.7M 4s
   900K .......... .......... .......... .......... ..........  1% 88.8M 3s
   950K .......... .......... .......... .......... ..........  1% 12.3M 3s
  1000K .......... .......... .......... .......... ..........  1% 66.6M 3s
  1050K .......... .......... .......... .......... ..........  1%  127M 3s
  1100K .......... .......... .......... .......... ..........  1% 53.0M 3s
  1150K .......... .......... .......... .......... ..........  2% 85.6M 3s
  1200K .......... .......... .......... .......... ..........  2% 81.8M 3s
  1250K .......... .......... .......... .......... ..........  2% 91.5M 3s
  1300K .......... .......... .......... .......... ..........  2% 85.0M 3s
  1350K .......... .......... .......... .......... ..........  2% 78.6M 3s
  1400K .......... .......... .......... .......... ..........  2% 74.0M 3s
  1450K .......... .......... .......... .......... ..........  2% 81.5M 3s
  1500K .......... .......... .......... .......... ..........  2% 82.1M 2s
  1550K .......... .......... .......... .......... ..........  2% 82.2M 2s
  1600K .......... .......... .......... .......... ..........  2%  103M 2s
  1650K .......... .......... .......... .......... ..........  2%  103M 2s
  1700K .......... .......... .......... .......... ..........  2% 91.0M 2s
  1750K .......... .......... .......... .......... ..........  3% 82.6M 2s
  1800K .......... .......... .......... .......... ..........  3% 86.4M 2s
  1850K .......... .......... .......... .......... ..........  3% 90.7M 2s
  1900K .......... .......... .......... .......... ..........  3% 79.5M 2s
  1950K .......... .......... .......... .......... ..........  3% 70.8M 2s
  2000K .......... .......... .......... .......... ..........  3% 91.1M 2s
  2050K .......... .......... .......... .......... ..........  3% 75.5M 2s
  2100K .......... .......... .......... .......... ..........  3% 87.2M 2s
  2150K .......... .......... .......... .......... ..........  3% 64.9M 2s
  2200K .......... .......... .......... .......... ..........  3% 74.8M 2s
  2250K .......... .......... .......... .......... ..........  3% 71.8M 2s
  2300K .......... .......... .......... .......... ..........  3% 77.2M 2s
  2350K .......... .......... .......... .......... ..........  4% 59.8M 2s
  2400K .......... .......... .......... .......... ..........  4% 86.2M 2s
  2450K .......... .......... .......... .......... ..........  4% 92.3M 2s
  2500K .......... .......... .......... .......... ..........  4% 75.4M 2s
  2550K .......... .......... .......... .......... ..........  4% 87.0M 2s
  2600K .......... .......... .......... .......... ..........  4% 89.6M 2s
  2650K .......... .......... .......... .......... ..........  4%  126M 2s
  2700K .......... .......... .......... .......... ..........  4%  107M 2s
  2750K .......... .......... .......... .......... ..........  4%  108M 2s
  2800K .......... .......... .......... .......... ..........  4%  150M 2s
  2850K .......... .......... .......... .......... ..........  4% 6.79M 2s
  2900K .......... .......... .......... .......... ..........  4% 88.7M 2s
  2950K .......... .......... .......... .......... ..........  5% 62.4M 2s
  3000K .......... .......... .......... .......... ..........  5% 82.6M 2s
  3050K .......... .......... .......... .......... ..........  5% 70.7M 2s
  3100K .......... .......... .......... .......... ..........  5% 76.5M 2s
  3150K .......... .......... .......... .......... ..........  5% 64.7M 2s
  3200K .......... .......... .......... .......... ..........  5% 75.1M 2s
  3250K .......... .......... .......... .......... ..........  5% 85.2M 2s
  3300K .......... .......... .......... .......... ..........  5% 67.2M 2s
  3350K .......... .......... .......... .......... ..........  5% 57.0M 2s
  3400K .......... .......... .......... .......... ..........  5% 77.5M 2s
  3450K .......... .......... .......... .......... ..........  5% 72.9M 2s
  3500K .......... .......... .......... .......... ..........  5% 72.1M 2s
  3550K .......... .......... .......... .......... ..........  6% 65.7M 2s
  3600K .......... .......... .......... .......... ..........  6% 67.8M 2s
  3650K .......... .......... .......... .......... ..........  6% 77.6M 1s
  3700K .......... .......... .......... .......... ..........  6% 71.1M 1s
  3750K .......... .......... .......... .......... ..........  6% 57.3M 1s
  3800K .......... .......... .......... .......... ..........  6% 69.2M 1s
  3850K .......... .......... .......... .......... ..........  6% 74.5M 1s
  3900K .......... .......... .......... .......... ..........  6% 79.1M 1s
  3950K .......... .......... .......... .......... ..........  6% 73.3M 1s
  4000K .......... .......... .......... .......... ..........  6% 64.1M 1s
  4050K .......... .......... .......... .......... ..........  6% 85.2M 1s
  4100K .......... .......... .......... .......... ..........  6% 68.1M 1s
  4150K .......... .......... .......... .......... ..........  7% 63.9M 1s
  4200K .......... .......... .......... .......... ..........  7% 82.1M 1s
  4250K .......... .......... .......... .......... ..........  7% 57.8M 1s
  4300K .......... .......... .......... .......... ..........  7% 81.6M 1s
  4350K .......... .......... .......... .......... ..........  7% 4.48M 1s
  4400K .......... .......... .......... .......... ..........  7% 95.6M 1s
  4450K .......... .......... .......... .......... ..........  7% 77.9M 1s
  4500K .......... .......... .......... .......... ..........  7% 88.5M 1s
  4550K .......... .......... .......... .......... ..........  7% 73.6M 1s
  4600K .......... .......... .......... .......... ..........  7% 23.0M 1s
  4650K .......... .......... .......... .......... ..........  7%  126M 1s
  4700K .......... .......... .......... .......... ..........  7%  145M 1s
  4750K .......... .......... .......... .......... ..........  8% 64.5M 1s
  4800K .......... .......... .......... .......... ..........  8% 32.3M 1s
  4850K .......... .......... .......... .......... ..........  8% 11.0M 1s
  4900K .......... .......... .......... .......... ..........  8%  122M 1s
  4950K .......... .......... .......... .......... ..........  8% 9.11M 1s
  5000K .......... .......... .......... .......... ..........  8% 94.6M 1s
  5050K .......... .......... .......... .......... ..........  8%  100M 1s
  5100K .......... .......... .......... .......... ..........  8%  114M 1s
  5150K .......... .......... .......... .......... ..........  8%  102M 1s
  5200K .......... .......... .......... .......... ..........  8% 94.7M 1s
  5250K .......... .......... .......... .......... ..........  8%  107M 1s
  5300K .......... .......... .......... .......... ..........  8% 67.8M 1s
  5350K .......... .......... .......... .......... ..........  9%  121M 1s
  5400K .......... .......... .......... .......... ..........  9% 92.2M 1s
  5450K .......... .......... .......... .......... ..........  9%  134M 1s
  5500K .......... .......... .......... .......... ..........  9%  140M 1s
  5550K .......... .......... .......... .......... ..........  9% 93.9M 1s
  5600K .......... .......... .......... .......... ..........  9%  125M 1s
  5650K .......... .......... .......... .......... ..........  9%  118M 1s
  5700K .......... .......... .......... .......... ..........  9% 99.6M 1s
  5750K .......... .......... .......... .......... ..........  9%  121M 1s
  5800K .......... .......... .......... .......... ..........  9%  107M 1s
  5850K .......... .......... .......... .......... ..........  9%  142M 1s
  5900K .......... .......... .......... .......... ..........  9%  156M 1s
  5950K .......... .......... .......... .......... .......... 10%  104M 1s
  6000K .......... .......... .......... .......... .......... 10%  145M 1s
  6050K .......... .......... .......... .......... .......... 10%  153M 1s
  6100K .......... .......... .......... .......... .......... 10%  104M 1s
  6150K .......... .......... .......... .......... .......... 10% 94.7M 1s
  6200K .......... .......... .......... .......... .......... 10% 71.3M 1s
  6250K .......... .......... .......... .......... .......... 10%  104M 1s
  6300K .......... .......... .......... .......... .......... 10% 82.2M 1s
  6350K .......... .......... .......... .......... .......... 10% 86.7M 1s
  6400K .......... .......... .......... .......... .......... 10%  105M 1s
  6450K .......... .......... .......... .......... .......... 10%  100M 1s
  6500K .......... .......... .......... .......... .......... 11%  112M 1s
  6550K .......... .......... .......... .......... .......... 11% 66.9M 1s
  6600K .......... .......... .......... .......... .......... 11%  137M 1s
  6650K .......... .......... .......... .......... .......... 11%  102M 1s
  6700K .......... .......... .......... .......... .......... 11%  116M 1s
  6750K .......... .......... .......... .......... .......... 11%  115M 1s
  6800K .......... .......... .......... .......... .......... 11%  108M 1s
  6850K .......... .......... .......... .......... .......... 11%  143M 1s
  6900K .......... .......... .......... .......... .......... 11% 94.0M 1s
  6950K .......... .......... .......... .......... .......... 11%  100M 1s
  7000K .......... .......... .......... .......... .......... 11%  153M 1s
  7050K .......... .......... .......... .......... .......... 11% 77.2M 1s
  7100K .......... .......... .......... .......... .......... 12%  110M 1s
  7150K .......... .......... .......... .......... .......... 12% 63.3M 1s
  7200K .......... .......... .......... .......... .......... 12% 86.5M 1s
  7250K .......... .......... .......... .......... .......... 12% 88.8M 1s
  7300K .......... .......... .......... .......... .......... 12%  109M 1s
  7350K .......... .......... .......... .......... .......... 12% 85.1M 1s
  7400K .......... .......... .......... .......... .......... 12%  127M 1s
  7450K .......... .......... .......... .......... .......... 12%  128M 1s
  7500K .......... .......... .......... .......... .......... 12%  129M 1s
  7550K .......... .......... .......... .......... .......... 12%  141M 1s
  7600K .......... .......... .......... .......... .......... 12%  115M 1s
  7650K .......... .......... .......... .......... .......... 12%  122M 1s
  7700K .......... .......... .......... .......... .......... 13%  151M 1s
  7750K .......... .......... .......... .......... .......... 13% 34.2M 1s
  7800K .......... .......... .......... .......... .......... 13% 28.5M 1s
  7850K .......... .......... .......... .......... .......... 13% 30.1M 1s
  7900K .......... .......... .......... .......... .......... 13% 31.6M 1s
  7950K .......... .......... .......... .......... .......... 13% 26.2M 1s
  8000K .......... .......... .......... .......... .......... 13% 29.8M 1s
  8050K .......... .......... .......... .......... .......... 13% 31.2M 1s
  8100K .......... .......... .......... .......... .......... 13% 30.2M 1s
  8150K .......... .......... .......... .......... .......... 13% 24.6M 1s
  8200K .......... .......... .......... .......... .......... 13% 30.7M 1s
  8250K .......... .......... .......... .......... .......... 13% 33.0M 1s
  8300K .......... .......... .......... .......... .......... 14% 40.0M 1s
  8350K .......... .......... .......... .......... .......... 14% 33.0M 1s
  8400K .......... .......... .......... .......... .......... 14% 31.6M 1s
  8450K .......... .......... .......... .......... .......... 14% 35.9M 1s
  8500K .......... .......... .......... .......... .......... 14% 37.3M 1s
  8550K .......... .......... .......... .......... .......... 14% 25.2M 1s
  8600K .......... .......... .......... .......... .......... 14% 31.4M 1s
  8650K .......... .......... .......... .......... .......... 14% 34.6M 1s
  8700K .......... .......... .......... .......... .......... 14% 35.7M 1s
  8750K .......... .......... .......... .......... .......... 14% 25.8M 1s
  8800K .......... .......... .......... .......... .......... 14% 28.4M 1s
  8850K .......... .......... .......... .......... .......... 14% 29.3M 1s
  8900K .......... .......... .......... .......... .......... 15% 28.5M 1s
  8950K .......... .......... .......... .......... .......... 15% 30.5M 1s
  9000K .......... .......... .......... .......... .......... 15% 26.6M 1s
  9050K .......... .......... .......... .......... .......... 15% 28.8M 1s
  9100K .......... .......... .......... .......... .......... 15% 28.9M 1s
  9150K .......... .......... .......... .......... .......... 15% 27.5M 1s
  9200K .......... .......... .......... .......... .......... 15% 29.0M 1s
  9250K .......... .......... .......... .......... .......... 15% 30.5M 1s
  9300K .......... .......... .......... .......... .......... 15% 29.2M 1s
  9350K .......... .......... .......... .......... .......... 15% 25.7M 1s
  9400K .......... .......... .......... .......... .......... 15% 25.8M 1s
  9450K .......... .......... .......... .......... .......... 15% 29.3M 1s
  9500K .......... .......... .......... .......... .......... 16% 3.77M 1s
  9550K .......... .......... .......... .......... .......... 16% 19.9M 1s
  9600K .......... .......... .......... .......... .......... 16% 22.7M 1s
  9650K .......... .......... .......... .......... .......... 16% 31.6M 1s
  9700K .......... .......... .......... .......... .......... 16% 26.2M 1s
  9750K .......... .......... .......... .......... .......... 16% 21.1M 1s
  9800K .......... .......... .......... .......... .......... 16% 18.2M 1s
  9850K .......... .......... .......... .......... .......... 16% 26.4M 1s
  9900K .......... .......... .......... .......... .......... 16% 38.4M 1s
  9950K .......... .......... .......... .......... .......... 16% 29.7M 1s
 10000K .......... .......... .......... .......... .......... 16% 35.5M 1s
 10050K .......... .......... .......... .......... .......... 16% 32.0M 1s
 10100K .......... .......... .......... .......... .......... 17% 31.2M 1s
 10150K .......... .......... .......... .......... .......... 17% 29.4M 1s
 10200K .......... .......... .......... .......... .......... 17% 29.2M 1s
 10250K .......... .......... .......... .......... .......... 17% 33.4M 1s
 10300K .......... .......... .......... .......... .......... 17% 35.5M 1s
 10350K .......... .......... .......... .......... .......... 17% 25.0M 1s
 10400K .......... .......... .......... .......... .......... 17% 22.0M 1s
 10450K .......... .......... .......... .......... .......... 17% 29.7M 1s
 10500K .......... .......... .......... .......... .......... 17% 28.5M 1s
 10550K .......... .......... .......... .......... .......... 17% 30.8M 1s
 10600K .......... .......... .......... .......... .......... 17% 99.1M 1s
 10650K .......... .......... .......... .......... .......... 17%  114M 1s
 10700K .......... .......... .......... .......... .......... 18%  104M 1s
 10750K .......... .......... .......... .......... .......... 18% 90.1M 1s
 10800K .......... .......... .......... .......... .......... 18%  106M 1s
 10850K .......... .......... .......... .......... .......... 18%  108M 1s
 10900K .......... .......... .......... .......... .......... 18%  108M 1s
 10950K .......... .......... .......... .......... .......... 18% 98.7M 1s
 11000K .......... .......... .......... .......... .......... 18%  103M 1s
 11050K .......... .......... .......... .......... .......... 18% 97.0M 1s
 11100K .......... .......... .......... .......... .......... 18%  105M 1s
 11150K .......... .......... .......... .......... .......... 18%  101M 1s
 11200K .......... .......... .......... .......... .......... 18% 83.4M 1s
 11250K .......... .......... .......... .......... .......... 18%  106M 1s
 11300K .......... .......... .......... .......... .......... 19%  125M 1s
 11350K .......... .......... .......... .......... .......... 19%  107M 1s
 11400K .......... .......... .......... .......... .......... 19%  103M 1s
 11450K .......... .......... .......... .......... .......... 19% 98.8M 1s
 11500K .......... .......... .......... .......... .......... 19%  107M 1s
 11550K .......... .......... .......... .......... .......... 19%  117M 1s
 11600K .......... .......... .......... .......... .......... 19%  120M 1s
 11650K .......... .......... .......... .......... .......... 19%  125M 1s
 11700K .......... .......... .......... .......... .......... 19%  114M 1s
 11750K .......... .......... .......... .......... .......... 19%  104M 1s
 11800K .......... .......... .......... .......... .......... 19%  115M 1s
 11850K .......... .......... .......... .......... .......... 19%  120M 1s
 11900K .......... .......... .......... .......... .......... 20%  118M 1s
 11950K .......... .......... .......... .......... .......... 20%  113M 1s
 12000K .......... .......... .......... .......... .......... 20%  119M 1s
 12050K .......... .......... .......... .......... .......... 20%  115M 1s
 12100K .......... .......... .......... .......... .......... 20%  106M 1s
 12150K .......... .......... .......... .......... .......... 20%  105M 1s
 12200K .......... .......... .......... .......... .......... 20%  122M 1s
 12250K .......... .......... .......... .......... .......... 20%  124M 1s
 12300K .......... .......... .......... .......... .......... 20%  120M 1s
 12350K .......... .......... .......... .......... .......... 20%  116M 1s
 12400K .......... .......... .......... .......... .......... 20%  115M 1s
 12450K .......... .......... .......... .......... .......... 21%  107M 1s
 12500K .......... .......... .......... .......... .......... 21%  119M 1s
 12550K .......... .......... .......... .......... .......... 21% 79.7M 1s
 12600K .......... .......... .......... .......... .......... 21%  159M 1s
 12650K .......... .......... .......... .......... .......... 21%  142M 1s
 12700K .......... .......... .......... .......... .......... 21%  132M 1s
 12750K .......... .......... .......... .......... .......... 21% 80.5M 1s
 12800K .......... .......... .......... .......... .......... 21%  121M 1s
 12850K .......... .......... .......... .......... .......... 21%  112M 1s
 12900K .......... .......... .......... .......... .......... 21%  119M 1s
 12950K .......... .......... .......... .......... .......... 21% 92.5M 1s
 13000K .......... .......... .......... .......... .......... 21%  100M 1s
 13050K .......... .......... .......... .......... .......... 22%  110M 1s
 13100K .......... .......... .......... .......... .......... 22%  118M 1s
 13150K .......... .......... .......... .......... .......... 22%  112M 1s
 13200K .......... .......... .......... .......... .......... 22% 5.60M 1s
 13250K .......... .......... .......... .......... .......... 22%  114M 1s
 13300K .......... .......... .......... .......... .......... 22%  100M 1s
 13350K .......... .......... .......... .......... .......... 22%  102M 1s
 13400K .......... .......... .......... .......... .......... 22%  105M 1s
 13450K .......... .......... .......... .......... .......... 22%  127M 1s
 13500K .......... .......... .......... .......... .......... 22% 93.5M 1s
 13550K .......... .......... .......... .......... .......... 22% 88.1M 1s
 13600K .......... .......... .......... .......... .......... 22%  120M 1s
 13650K .......... .......... .......... .......... .......... 23%  128M 1s
 13700K .......... .......... .......... .......... .......... 23%  101M 1s
 13750K .......... .......... .......... .......... .......... 23%  109M 1s
 13800K .......... .......... .......... .......... .......... 23%  150M 1s
 13850K .......... .......... .......... .......... .......... 23% 98.1M 1s
 13900K .......... .......... .......... .......... .......... 23%  119M 1s
 13950K .......... .......... .......... .......... .......... 23% 90.2M 1s
 14000K .......... .......... .......... .......... .......... 23%  125M 1s
 14050K .......... .......... .......... .......... .......... 23%  119M 1s
 14100K .......... .......... .......... .......... .......... 23%  119M 1s
 14150K .......... .......... .......... .......... .......... 23%  104M 1s
 14200K .......... .......... .......... .......... .......... 23%  127M 1s
 14250K .......... .......... .......... .......... .......... 24%  117M 1s
 14300K .......... .......... .......... .......... .......... 24%  123M 1s
 14350K .......... .......... .......... .......... .......... 24% 89.7M 1s
 14400K .......... .......... .......... .......... .......... 24%  113M 1s
 14450K .......... .......... .......... .......... .......... 24%  107M 1s
 14500K .......... .......... .......... .......... .......... 24%  125M 1s
 14550K .......... .......... .......... .......... .......... 24% 97.4M 1s
 14600K .......... .......... .......... .......... .......... 24%  111M 1s
 14650K .......... .......... .......... .......... .......... 24%  123M 1s
 14700K .......... .......... .......... .......... .......... 24%  114M 1s
 14750K .......... .......... .......... .......... .......... 24%  104M 1s
 14800K .......... .......... .......... .......... .......... 24%  108M 1s
 14850K .......... .......... .......... .......... .......... 25%  114M 1s
 14900K .......... .......... .......... .......... .......... 25%  114M 1s
 14950K .......... .......... .......... .......... .......... 25%  101M 1s
 15000K .......... .......... .......... .......... .......... 25%  112M 1s
 15050K .......... .......... .......... .......... .......... 25%  111M 1s
 15100K .......... .......... .......... .......... .......... 25%  119M 1s
 15150K .......... .......... .......... .......... .......... 25%  105M 1s
 15200K .......... .......... .......... .......... .......... 25%  116M 1s
 15250K .......... .......... .......... .......... .......... 25%  114M 1s
 15300K .......... .......... .......... .......... .......... 25%  127M 1s
 15350K .......... .......... .......... .......... .......... 25%  104M 1s
 15400K .......... .......... .......... .......... .......... 25%  125M 1s
 15450K .......... .......... .......... .......... .......... 26%  124M 1s
 15500K .......... .......... .......... .......... .......... 26%  105M 1s
 15550K .......... .......... .......... .......... .......... 26%  104M 1s
 15600K .......... .......... .......... .......... .......... 26%  106M 1s
 15650K .......... .......... .......... .......... .......... 26%  120M 1s
 15700K .......... .......... .......... .......... .......... 26%  111M 1s
 15750K .......... .......... .......... .......... .......... 26% 95.2M 1s
 15800K .......... .......... .......... .......... .......... 26%  126M 1s
 15850K .......... .......... .......... .......... .......... 26%  108M 1s
 15900K .......... .......... .......... .......... .......... 26%  119M 1s
 15950K .......... .......... .......... .......... .......... 26% 89.3M 1s
 16000K .......... .......... .......... .......... .......... 26%  111M 1s
 16050K .......... .......... .......... .......... .......... 27%  122M 1s
 16100K .......... .......... .......... .......... .......... 27%  114M 1s
 16150K .......... .......... .......... .......... .......... 27% 93.8M 1s

*** WARNING: skipped 41040 bytes of output ***

 43200K .......... .......... .......... .......... .......... 72%  124M 0s
 43250K .......... .......... .......... .......... .......... 72%  107M 0s
 43300K .......... .......... .......... .......... .......... 72% 11.8M 0s
 43350K .......... .......... .......... .......... .......... 72% 52.4M 0s
 43400K .......... .......... .......... .......... .......... 73% 66.4M 0s
 43450K .......... .......... .......... .......... .......... 73% 68.0M 0s
 43500K .......... .......... .......... .......... .......... 73%  115M 0s
 43550K .......... .......... .......... .......... .......... 73%  109M 0s
 43600K .......... .......... .......... .......... .......... 73%  111M 0s
 43650K .......... .......... .......... .......... .......... 73%  117M 0s
 43700K .......... .......... .......... .......... .......... 73% 55.8M 0s
 43750K .......... .......... .......... .......... .......... 73%  101M 0s
 43800K .......... .......... .......... .......... .......... 73%  113M 0s
 43850K .......... .......... .......... .......... .......... 73%  125M 0s
 43900K .......... .......... .......... .......... .......... 73%  118M 0s
 43950K .......... .......... .......... .......... .......... 73%  100M 0s
 44000K .......... .......... .......... .......... .......... 74%  129M 0s
 44050K .......... .......... .......... .......... .......... 74%  124M 0s
 44100K .......... .......... .......... .......... .......... 74% 50.6M 0s
 44150K .......... .......... .......... .......... .......... 74% 50.6M 0s
 44200K .......... .......... .......... .......... .......... 74% 66.3M 0s
 44250K .......... .......... .......... .......... .......... 74% 68.9M 0s
 44300K .......... .......... .......... .......... .......... 74% 66.5M 0s
 44350K .......... .......... .......... .......... .......... 74% 61.2M 0s
 44400K .......... .......... .......... .......... .......... 74%  125M 0s
 44450K .......... .......... .......... .......... .......... 74%  132M 0s
 44500K .......... .......... .......... .......... .......... 74%  108M 0s
 44550K .......... .......... .......... .......... .......... 74%  100M 0s
 44600K .......... .......... .......... .......... .......... 75%  118M 0s
 44650K .......... .......... .......... .......... .......... 75%  119M 0s
 44700K .......... .......... .......... .......... .......... 75% 96.6M 0s
 44750K .......... .......... .......... .......... .......... 75% 75.6M 0s
 44800K .......... .......... .......... .......... .......... 75% 84.7M 0s
 44850K .......... .......... .......... .......... .......... 75% 93.7M 0s
 44900K .......... .......... .......... .......... .......... 75% 26.2M 0s
 44950K .......... .......... .......... .......... .......... 75% 51.1M 0s
 45000K .......... .......... .......... .......... .......... 75% 57.5M 0s
 45050K .......... .......... .......... .......... .......... 75% 52.9M 0s
 45100K .......... .......... .......... .......... .......... 75% 54.3M 0s
 45150K .......... .......... .......... .......... .......... 75% 55.2M 0s
 45200K .......... .......... .......... .......... .......... 76% 66.7M 0s
 45250K .......... .......... .......... .......... .......... 76% 57.9M 0s
 45300K .......... .......... .......... .......... .......... 76% 63.0M 0s
 45350K .......... .......... .......... .......... .......... 76% 49.8M 0s
 45400K .......... .......... .......... .......... .......... 76% 64.7M 0s
 45450K .......... .......... .......... .......... .......... 76% 67.4M 0s
 45500K .......... .......... .......... .......... .......... 76% 66.6M 0s
 45550K .......... .......... .......... .......... .......... 76% 71.5M 0s
 45600K .......... .......... .......... .......... .......... 76%  127M 0s
 45650K .......... .......... .......... .......... .......... 76%  118M 0s
 45700K .......... .......... .......... .......... .......... 76%  119M 0s
 45750K .......... .......... .......... .......... .......... 76%  110M 0s
 45800K .......... .......... .......... .......... .......... 77% 56.2M 0s
 45850K .......... .......... .......... .......... .......... 77% 58.3M 0s
 45900K .......... .......... .......... .......... .......... 77% 62.2M 0s
 45950K .......... .......... .......... .......... .......... 77% 49.2M 0s
 46000K .......... .......... .......... .......... .......... 77% 44.7M 0s
 46050K .......... .......... .......... .......... .......... 77%  104M 0s
 46100K .......... .......... .......... .......... .......... 77%  123M 0s
 46150K .......... .......... .......... .......... .......... 77% 95.9M 0s
 46200K .......... .......... .......... .......... .......... 77% 94.1M 0s
 46250K .......... .......... .......... .......... .......... 77%  105M 0s
 46300K .......... .......... .......... .......... .......... 77%  114M 0s
 46350K .......... .......... .......... .......... .......... 77%  101M 0s
 46400K .......... .......... .......... .......... .......... 78%  115M 0s
 46450K .......... .......... .......... .......... .......... 78% 59.9M 0s
 46500K .......... .......... .......... .......... .......... 78% 66.9M 0s
 46550K .......... .......... .......... .......... .......... 78% 61.3M 0s
 46600K .......... .......... .......... .......... .......... 78%  119M 0s
 46650K .......... .......... .......... .......... .......... 78%  117M 0s
 46700K .......... .......... .......... .......... .......... 78%  118M 0s
 46750K .......... .......... .......... .......... .......... 78% 73.3M 0s
 46800K .......... .......... .......... .......... .......... 78% 43.7M 0s
 46850K .......... .......... .......... .......... .......... 78% 65.8M 0s
 46900K .......... .......... .......... .......... .......... 78% 69.9M 0s
 46950K .......... .......... .......... .......... .......... 78% 91.2M 0s
 47000K .......... .......... .......... .......... .......... 79%  115M 0s
 47050K .......... .......... .......... .......... .......... 79% 59.8M 0s
 47100K .......... .......... .......... .......... .......... 79% 67.6M 0s
 47150K .......... .......... .......... .......... .......... 79%  109M 0s
 47200K .......... .......... .......... .......... .......... 79%  123M 0s
 47250K .......... .......... .......... .......... .......... 79%  116M 0s
 47300K .......... .......... .......... .......... .......... 79% 90.3M 0s
 47350K .......... .......... .......... .......... .......... 79% 97.4M 0s
 47400K .......... .......... .......... .......... .......... 79% 84.2M 0s
 47450K .......... .......... .......... .......... .......... 79% 44.8M 0s
 47500K .......... .......... .......... .......... .......... 79% 63.1M 0s
 47550K .......... .......... .......... .......... .......... 79% 73.1M 0s
 47600K .......... .......... .......... .......... .......... 80%  114M 0s
 47650K .......... .......... .......... .......... .......... 80%  125M 0s
 47700K .......... .......... .......... .......... .......... 80% 98.4M 0s
 47750K .......... .......... .......... .......... .......... 80% 52.6M 0s
 47800K .......... .......... .......... .......... .......... 80%  108M 0s
 47850K .......... .......... .......... .......... .......... 80%  123M 0s
 47900K .......... .......... .......... .......... .......... 80%  120M 0s
 47950K .......... .......... .......... .......... .......... 80%  106M 0s
 48000K .......... .......... .......... .......... .......... 80%  119M 0s
 48050K .......... .......... .......... .......... .......... 80%  122M 0s
 48100K .......... .......... .......... .......... .......... 80% 56.2M 0s
 48150K .......... .......... .......... .......... .......... 80% 47.5M 0s
 48200K .......... .......... .......... .......... .......... 81% 55.7M 0s
 48250K .......... .......... .......... .......... .......... 81% 65.2M 0s
 48300K .......... .......... .......... .......... .......... 81% 84.8M 0s
 48350K .......... .......... .......... .......... .......... 81% 11.5M 0s
 48400K .......... .......... .......... .......... .......... 81% 17.6M 0s
 48450K .......... .......... .......... .......... .......... 81% 48.7M 0s
 48500K .......... .......... .......... .......... .......... 81% 77.4M 0s
 48550K .......... .......... .......... .......... .......... 81% 5.73M 0s
 48600K .......... .......... .......... .......... .......... 81% 19.9M 0s
 48650K .......... .......... .......... .......... .......... 81% 15.9M 0s
 48700K .......... .......... .......... .......... .......... 81% 22.3M 0s
 48750K .......... .......... .......... .......... .......... 81% 19.3M 0s
 48800K .......... .......... .......... .......... .......... 82% 21.3M 0s
 48850K .......... .......... .......... .......... .......... 82% 21.8M 0s
 48900K .......... .......... .......... .......... .......... 82% 19.3M 0s
 48950K .......... .......... .......... .......... .......... 82% 15.5M 0s
 49000K .......... .......... .......... .......... .......... 82% 12.9M 0s
 49050K .......... .......... .......... .......... .......... 82% 40.4M 0s
 49100K .......... .......... .......... .......... .......... 82% 41.2M 0s
 49150K .......... .......... .......... .......... .......... 82% 35.8M 0s
 49200K .......... .......... .......... .......... .......... 82% 40.8M 0s
 49250K .......... .......... .......... .......... .......... 82% 41.4M 0s
 49300K .......... .......... .......... .......... .......... 82% 41.4M 0s
 49350K .......... .......... .......... .......... .......... 82% 37.9M 0s
 49400K .......... .......... .......... .......... .......... 83% 33.0M 0s
 49450K .......... .......... .......... .......... .......... 83% 34.4M 0s
 49500K .......... .......... .......... .......... .......... 83% 43.2M 0s
 49550K .......... .......... .......... .......... .......... 83% 37.7M 0s
 49600K .......... .......... .......... .......... .......... 83% 42.6M 0s
 49650K .......... .......... .......... .......... .......... 83% 42.5M 0s
 49700K .......... .......... .......... .......... .......... 83% 42.5M 0s
 49750K .......... .......... .......... .......... .......... 83% 41.0M 0s
 49800K .......... .......... .......... .......... .......... 83%  110M 0s
 49850K .......... .......... .......... .......... .......... 83% 80.4M 0s
 49900K .......... .......... .......... .......... .......... 83% 80.0M 0s
 49950K .......... .......... .......... .......... .......... 84%  116M 0s
 50000K .......... .......... .......... .......... .......... 84% 86.4M 0s
 50050K .......... .......... .......... .......... .......... 84% 92.7M 0s
 50100K .......... .......... .......... .......... .......... 84% 91.8M 0s
 50150K .......... .......... .......... .......... .......... 84% 88.4M 0s
 50200K .......... .......... .......... .......... .......... 84% 90.7M 0s
 50250K .......... .......... .......... .......... .......... 84% 90.6M 0s
 50300K .......... .......... .......... .......... .......... 84%  101M 0s
 50350K .......... .......... .......... .......... .......... 84%  115M 0s
 50400K .......... .......... .......... .......... .......... 84%  215M 0s
 50450K .......... .......... .......... .......... .......... 84%  248M 0s
 50500K .......... .......... .......... .......... .......... 84%  248M 0s
 50550K .......... .......... .......... .......... .......... 85%  208M 0s
 50600K .......... .......... .......... .......... .......... 85%  245M 0s
 50650K .......... .......... .......... .......... .......... 85%  190M 0s
 50700K .......... .......... .......... .......... .......... 85%  246M 0s
 50750K .......... .......... .......... .......... .......... 85%  225M 0s
 50800K .......... .......... .......... .......... .......... 85%  249M 0s
 50850K .......... .......... .......... .......... .......... 85%  227M 0s
 50900K .......... .......... .......... .......... .......... 85% 74.6M 0s
 50950K .......... .......... .......... .......... .......... 85% 78.2M 0s
 51000K .......... .......... .......... .......... .......... 85%  127M 0s
 51050K .......... .......... .......... .......... .......... 85%  127M 0s
 51100K .......... .......... .......... .......... .......... 85%  132M 0s
 51150K .......... .......... .......... .......... .......... 86% 59.4M 0s
 51200K .......... .......... .......... .......... .......... 86% 79.7M 0s
 51250K .......... .......... .......... .......... .......... 86% 11.7M 0s
 51300K .......... .......... .......... .......... .......... 86% 92.7M 0s
 51350K .......... .......... .......... .......... .......... 86% 80.8M 0s
 51400K .......... .......... .......... .......... .......... 86% 93.1M 0s
 51450K .......... .......... .......... .......... .......... 86%  112M 0s
 51500K .......... .......... .......... .......... .......... 86% 68.9M 0s
 51550K .......... .......... .......... .......... .......... 86% 8.58M 0s
 51600K .......... .......... .......... .......... .......... 86% 40.9M 0s
 51650K .......... .......... .......... .......... .......... 86% 35.9M 0s
 51700K .......... .......... .......... .......... .......... 86% 26.9M 0s
 51750K .......... .......... .......... .......... .......... 87% 33.7M 0s
 51800K .......... .......... .......... .......... .......... 87% 46.3M 0s
 51850K .......... .......... .......... .......... .......... 87% 44.5M 0s
 51900K .......... .......... .......... .......... .......... 87% 29.9M 0s
 51950K .......... .......... .......... .......... .......... 87% 23.4M 0s
 52000K .......... .......... .......... .......... .......... 87% 20.6M 0s
 52050K .......... .......... .......... .......... .......... 87% 51.7M 0s
 52100K .......... .......... .......... .......... .......... 87% 43.6M 0s
 52150K .......... .......... .......... .......... .......... 87% 36.5M 0s
 52200K .......... .......... .......... .......... .......... 87% 43.9M 0s
 52250K .......... .......... .......... .......... .......... 87% 28.6M 0s
 52300K .......... .......... .......... .......... .......... 87% 38.1M 0s
 52350K .......... .......... .......... .......... .......... 88% 39.0M 0s
 52400K .......... .......... .......... .......... .......... 88% 40.8M 0s
 52450K .......... .......... .......... .......... .......... 88% 41.0M 0s
 52500K .......... .......... .......... .......... .......... 88% 44.6M 0s
 52550K .......... .......... .......... .......... .......... 88% 39.1M 0s
 52600K .......... .......... .......... .......... .......... 88% 44.9M 0s
 52650K .......... .......... .......... .......... .......... 88% 29.4M 0s
 52700K .......... .......... .......... .......... .......... 88% 35.3M 0s
 52750K .......... .......... .......... .......... .......... 88% 33.0M 0s
 52800K .......... .......... .......... .......... .......... 88% 45.3M 0s
 52850K .......... .......... .......... .......... .......... 88%  120M 0s
 52900K .......... .......... .......... .......... .......... 88%  120M 0s
 52950K .......... .......... .......... .......... .......... 89% 91.1M 0s
 53000K .......... .......... .......... .......... .......... 89% 74.8M 0s
 53050K .......... .......... .......... .......... .......... 89%  123M 0s
 53100K .......... .......... .......... .......... .......... 89%  100M 0s
 53150K .......... .......... .......... .......... .......... 89% 79.9M 0s
 53200K .......... .......... .......... .......... .......... 89%  150M 0s
 53250K .......... .......... .......... .......... .......... 89%  165M 0s
 53300K .......... .......... .......... .......... .......... 89%  159M 0s
 53350K .......... .......... .......... .......... .......... 89% 65.6M 0s
 53400K .......... .......... .......... .......... .......... 89%  127M 0s
 53450K .......... .......... .......... .......... .......... 89% 38.7M 0s
 53500K .......... .......... .......... .......... .......... 89% 35.9M 0s
 53550K .......... .......... .......... .......... .......... 90% 40.1M 0s
 53600K .......... .......... .......... .......... .......... 90% 41.4M 0s
 53650K .......... .......... .......... .......... .......... 90% 32.2M 0s
 53700K .......... .......... .......... .......... .......... 90% 32.0M 0s
 53750K .......... .......... .......... .......... .......... 90% 25.5M 0s
 53800K .......... .......... .......... .......... .......... 90% 27.6M 0s
 53850K .......... .......... .......... .......... .......... 90% 23.7M 0s
 53900K .......... .......... .......... .......... .......... 90%  114M 0s
 53950K .......... .......... .......... .......... .......... 90%  111M 0s
 54000K .......... .......... .......... .......... .......... 90%  124M 0s
 54050K .......... .......... .......... .......... .......... 90%  122M 0s
 54100K .......... .......... .......... .......... .......... 90%  113M 0s
 54150K .......... .......... .......... .......... .......... 91% 78.2M 0s
 54200K .......... .......... .......... .......... .......... 91% 91.3M 0s
 54250K .......... .......... .......... .......... .......... 91% 87.3M 0s
 54300K .......... .......... .......... .......... .......... 91% 90.7M 0s
 54350K .......... .......... .......... .......... .......... 91% 96.7M 0s
 54400K .......... .......... .......... .......... .......... 91%  147M 0s
 54450K .......... .......... .......... .......... .......... 91%  242M 0s
 54500K .......... .......... .......... .......... .......... 91%  205M 0s
 54550K .......... .......... .......... .......... .......... 91%  155M 0s
 54600K .......... .......... .......... .......... .......... 91% 83.4M 0s
 54650K .......... .......... .......... .......... .......... 91% 98.4M 0s
 54700K .......... .......... .......... .......... .......... 91%  122M 0s
 54750K .......... .......... .......... .......... .......... 92%  112M 0s
 54800K .......... .......... .......... .......... .......... 92%  154M 0s
 54850K .......... .......... .......... .......... .......... 92%  104M 0s
 54900K .......... .......... .......... .......... .......... 92%  127M 0s
 54950K .......... .......... .......... .......... .......... 92%  134M 0s
 55000K .......... .......... .......... .......... .......... 92%  152M 0s
 55050K .......... .......... .......... .......... .......... 92%  144M 0s
 55100K .......... .......... .......... .......... .......... 92%  148M 0s
 55150K .......... .......... .......... .......... .......... 92% 80.0M 0s
 55200K .......... .......... .......... .......... .......... 92%  117M 0s
 55250K .......... .......... .......... .......... .......... 92%  118M 0s
 55300K .......... .......... .......... .......... .......... 92% 95.4M 0s
 55350K .......... .......... .......... .......... .......... 93%  105M 0s
 55400K .......... .......... .......... .......... .......... 93%  168M 0s
 55450K .......... .......... .......... .......... .......... 93%  148M 0s
 55500K .......... .......... .......... .......... .......... 93%  183M 0s
 55550K .......... .......... .......... .......... .......... 93% 49.5M 0s
 55600K .......... .......... .......... .......... .......... 93% 76.8M 0s
 55650K .......... .......... .......... .......... .......... 93% 80.0M 0s
 55700K .......... .......... .......... .......... .......... 93% 80.6M 0s
 55750K .......... .......... .......... .......... .......... 93% 66.5M 0s
 55800K .......... .......... .......... .......... .......... 93% 82.5M 0s
 55850K .......... .......... .......... .......... .......... 93% 79.4M 0s
 55900K .......... .......... .......... .......... .......... 94% 81.0M 0s
 55950K .......... .......... .......... .......... .......... 94% 87.4M 0s
 56000K .......... .......... .......... .......... .......... 94%  123M 0s
 56050K .......... .......... .......... .......... .......... 94%  124M 0s
 56100K .......... .......... .......... .......... .......... 94%  235M 0s
 56150K .......... .......... .......... .......... .......... 94% 95.1M 0s
 56200K .......... .......... .......... .......... .......... 94%  127M 0s
 56250K .......... .......... .......... .......... .......... 94%  109M 0s
 56300K .......... .......... .......... .......... .......... 94% 83.7M 0s
 56350K .......... .......... .......... .......... .......... 94% 89.5M 0s
 56400K .......... .......... .......... .......... .......... 94% 92.3M 0s
 56450K .......... .......... .......... .......... .......... 94% 86.6M 0s
 56500K .......... .......... .......... .......... .......... 95% 90.9M 0s
 56550K .......... .......... .......... .......... .......... 95%  103M 0s
 56600K .......... .......... .......... .......... .......... 95%  148M 0s
 56650K .......... .......... .......... .......... .......... 95%  237M 0s
 56700K .......... .......... .......... .......... .......... 95%  107M 0s
 56750K .......... .......... .......... .......... .......... 95% 71.7M 0s
 56800K .......... .......... .......... .......... .......... 95% 96.6M 0s
 56850K .......... .......... .......... .......... .......... 95%  225M 0s
 56900K .......... .......... .......... .......... .......... 95%  130M 0s
 56950K .......... .......... .......... .......... .......... 95%  209M 0s
 57000K .......... .......... .......... .......... .......... 95%  133M 0s
 57050K .......... .......... .......... .......... .......... 95%  230M 0s
 57100K .......... .......... .......... .......... .......... 96%  130M 0s
 57150K .......... .......... .......... .......... .......... 96%  214M 0s
 57200K .......... .......... .......... .......... .......... 96%  142M 0s
 57250K .......... .......... .......... .......... .......... 96%  120M 0s
 57300K .......... .......... .......... .......... .......... 96%  100M 0s
 57350K .......... .......... .......... .......... .......... 96% 77.4M 0s
 57400K .......... .......... .......... .......... .......... 96%  125M 0s
 57450K .......... .......... .......... .......... .......... 96%  130M 0s
 57500K .......... .......... .......... .......... .......... 96% 83.9M 0s
 57550K .......... .......... .......... .......... .......... 96% 87.4M 0s
 57600K .......... .......... .......... .......... .......... 96% 83.9M 0s
 57650K .......... .......... .......... .......... .......... 96%  105M 0s
 57700K .......... .......... .......... .......... .......... 97% 82.0M 0s
 57750K .......... .......... .......... .......... .......... 97%  138M 0s
 57800K .......... .......... .......... .......... .......... 97%  123M 0s
 57850K .......... .......... .......... .......... .......... 97%  213M 0s
 57900K .......... .......... .......... .......... .......... 97%  202M 0s
 57950K .......... .......... .......... .......... .......... 97%  154M 0s
 58000K .......... .......... .......... .......... .......... 97%  125M 0s
 58050K .......... .......... .......... .......... .......... 97%  219M 0s
 58100K .......... .......... .......... .......... .......... 97%  246M 0s
 58150K .......... .......... .......... .......... .......... 97%  124M 0s
 58200K .......... .......... .......... .......... .......... 97%  133M 0s
 58250K .......... .......... .......... .......... .......... 97%  120M 0s
 58300K .......... .......... .......... .......... .......... 98%  121M 0s
 58350K .......... .......... .......... .......... .......... 98%  185M 0s
 58400K .......... .......... .......... .......... .......... 98%  172M 0s
 58450K .......... .......... .......... .......... .......... 98%  170M 0s
 58500K .......... .......... .......... .......... .......... 98%  251M 0s
 58550K .......... .......... .......... .......... .......... 98%  119M 0s
 58600K .......... .......... .......... .......... .......... 98% 20.1M 0s
 58650K .......... .......... .......... .......... .......... 98% 77.2M 0s
 58700K .......... .......... .......... .......... .......... 98% 83.0M 0s
 58750K .......... .......... .......... .......... .......... 98% 79.8M 0s
 58800K .......... .......... .......... .......... .......... 98% 97.9M 0s
 58850K .......... .......... .......... .......... .......... 98%  109M 0s
 58900K .......... .......... .......... .......... .......... 99% 89.6M 0s
 58950K .......... .......... .......... .......... .......... 99% 93.4M 0s
 59000K .......... .......... .......... .......... .......... 99% 83.1M 0s
 59050K .......... .......... .......... .......... .......... 99%  100M 0s
 59100K .......... .......... .......... .......... .......... 99%  104M 0s
 59150K .......... .......... .......... .......... .......... 99% 89.6M 0s
 59200K .......... .......... .......... .......... .......... 99%  106M 0s
 59250K .......... .......... .......... .......... .......... 99%  112M 0s
 59300K .......... .......... .......... .......... .......... 99%  101M 0s
 59350K .......... .......... .......... .......... .......... 99%  104M 0s
 59400K .......... .......... .......... .......... .......... 99% 24.2M 0s
 59450K .......... .......... .......... .......... .......... 99% 69.1M 0s
 59500K .......... .........                                  100%  162M=1.0s

2022-02-01 14:21:20 (58.9 MB/s) - ‘all.tsv’ saved [60947802/60947802]
pwd
/databricks/driver
dbutils.fs.mkdirs("dbfs:/datasets/magellan") //need not be done again!
res55: Boolean = true
dbutils.fs.cp("file:/databricks/driver/all.tsv", "dbfs:/datasets/magellan/") // load into dbfs
res56: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/magellan/"))
path name size
dbfs:/datasets/magellan/SFNbhd/ SFNbhd/ 0.0
dbfs:/datasets/magellan/all.tsv all.tsv 6.0947802e7
wget http://www.lamastex.org/courses/ScalableDataScience/2016/datasets/magellan/UberSF/planning_neighborhoods.zip
--2022-02-01 14:21:24--  http://www.lamastex.org/courses/ScalableDataScience/2016/datasets/magellan/UberSF/planning_neighborhoods.zip
Resolving www.lamastex.org (www.lamastex.org)... 166.62.28.100
Connecting to www.lamastex.org (www.lamastex.org)|166.62.28.100|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 163771 (160K) [application/zip]
Saving to: ‘planning_neighborhoods.zip’

     0K .......... .......... .......... .......... .......... 31% 36.7M 0s
    50K .......... .......... .......... .......... .......... 62%  294K 0s
   100K .......... .......... .......... .......... .......... 93%  296K 0s
   150K .........                                             100% 59.1K=0.5s

2022-02-01 14:21:25 (315 KB/s) - ‘planning_neighborhoods.zip’ saved [163771/163771]
unzip planning_neighborhoods.zip
Archive:  planning_neighborhoods.zip
  inflating: planning_neighborhoods.dbf  
  inflating: planning_neighborhoods.shx  
  inflating: planning_neighborhoods.shp.xml  
  inflating: planning_neighborhoods.shp  
  inflating: planning_neighborhoods.sbx  
  inflating: planning_neighborhoods.sbn  
  inflating: planning_neighborhoods.prj  
ls -al
total 59968
drwxr-xr-x 1 root root     4096 Feb  1 14:21 .
drwxr-xr-x 1 root root     4096 Feb  1 13:54 ..
-rw-r--r-- 1 root root 60947802 Feb  1 14:21 all.tsv
drwxr-xr-x 2 root root     4096 Jan  1  1970 conf
-rw-r--r-- 1 root root      704 Feb  1 13:54 derby.log
drwxr-xr-x 3 root root     4096 Feb  1 13:54 eventlogs
drwxr-xr-x 2 root root     4096 Feb  1 14:15 ganglia
drwxr-xr-x 2 root root     4096 Feb  1 14:00 logs
-rw-r--r-- 1 root root     1028 Jan 20  2012 planning_neighborhoods.dbf
-rw-r--r-- 1 root root      567 Jan 20  2012 planning_neighborhoods.prj
-rw-r--r-- 1 root root      516 Jan 20  2012 planning_neighborhoods.sbn
-rw-r--r-- 1 root root      164 Jan 20  2012 planning_neighborhoods.sbx
-rw-r--r-- 1 root root   214576 Jan 20  2012 planning_neighborhoods.shp
-rw-r--r-- 1 root root    21958 Jan 20  2012 planning_neighborhoods.shp.xml
-rw-r--r-- 1 root root      396 Jan 20  2012 planning_neighborhoods.shx
-rw-r--r-- 1 root root   163771 Nov  9  2015 planning_neighborhoods.zip
mv planning_neighborhoods.zip orig_planning_neighborhoods.zip

Let's prepare the files in a local directory named SFNbhd

  • make a directory called SFNbhd using the command mkdir SFNbhd
  • after making the directory specified by && move the files starting with planning_nei in to the directory we made SFNbhd by:
    • mv planning_nei* SFNbhd
  • list the contents of the current directory using ls
  • finally list the contents of the directory SFNbhd inside current directory using ls -al SFNbhd
mkdir SFNbhd && mv planning_nei* SFNbhd && ls 
ls -al SFNbhd
SFNbhd
all.tsv
conf
derby.log
eventlogs
ganglia
logs
orig_planning_neighborhoods.zip
total 264
drwxr-xr-x 2 root root   4096 Feb  1 14:21 .
drwxr-xr-x 1 root root   4096 Feb  1 14:21 ..
-rw-r--r-- 1 root root   1028 Jan 20  2012 planning_neighborhoods.dbf
-rw-r--r-- 1 root root    567 Jan 20  2012 planning_neighborhoods.prj
-rw-r--r-- 1 root root    516 Jan 20  2012 planning_neighborhoods.sbn
-rw-r--r-- 1 root root    164 Jan 20  2012 planning_neighborhoods.sbx
-rw-r--r-- 1 root root 214576 Jan 20  2012 planning_neighborhoods.shp
-rw-r--r-- 1 root root  21958 Jan 20  2012 planning_neighborhoods.shp.xml
-rw-r--r-- 1 root root    396 Jan 20  2012 planning_neighborhoods.shx
dbutils.fs.mkdirs("dbfs:/datasets/magellan/SFNbhd") //make the directory in dbfs - need not be done again!
res58: Boolean = true
// just copy each file - done for pedantic reasons; we can do more sophisticated dbfs loads for large shape files
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.dbf", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.prj", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.sbn", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.sbx", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shp", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shp.xml", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shx", "dbfs:/datasets/magellan/SFNbhd/")
res59: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/magellan/SFNbhd/"))
path name size
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.dbf planning_neighborhoods.dbf 1028.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.prj planning_neighborhoods.prj 567.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbn planning_neighborhoods.sbn 516.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbx planning_neighborhoods.sbx 164.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp planning_neighborhoods.shp 214576.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp.xml planning_neighborhoods.shp.xml 21958.0
dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shx planning_neighborhoods.shx 396.0

ScaDaMaLe Course site and book

By Marina Toger

TODO: Raaz - re-liven for 2021...

OSM

1.We define an area of interest and find coordinates of its boundary, AKA "bounding box". To do this go to https://www.openstreetmap.org and zoom roughly into the desired area. Then one can see the coordinates of the bounding box by using the export option.

2.To ingest data from OSM we use wget, in the following format:

wget -O MyFileName.osm "https://api.openstreetmap.org/api/0.6/map?bbox=l,b,r,t"

  • MyFileName.osm - give some informative file name

  • l = longitude of the LEFT boundary of the bounding box

  • b = lattitude of the BOTTOM boundary of the bounding box

  • r = longitude of the RIGHT boundary of the bounding box

  • t = lattitude of the TOP boundary of the bounding box

For instance if you know the bounding box, do:

  • TinyUppsalaCentrumWgot.osm - Tiny area in Uppsala Centrum

  • l = 17.63514

  • b = 59.85739

  • r = 17.64154

  • t = 59.86011

wget -O TinyUppsalaCentrumWgot.osm "https://api.openstreetmap.org/api/0.6/map?bbox=17.63514,59.85739,17.64154,59.86011"
//Imports
import magellan._
import magellan._
ls
conf
derby.log
eventlogs
ganglia
logs
wget -O UppsalaCentrumWgot.osm "https://api.openstreetmap.org/api/0.6/map?bbox=17.6244,59.8464,17.6661,59.8643"
--2022-02-01 14:26:09--  https://api.openstreetmap.org/api/0.6/map?bbox=17.6244,59.8464,17.6661,59.8643
Resolving api.openstreetmap.org (api.openstreetmap.org)... 130.117.76.11, 130.117.76.12, 130.117.76.13, ...
Connecting to api.openstreetmap.org (api.openstreetmap.org)|130.117.76.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]
Saving to: ‘UppsalaCentrumWgot.osm’

     0K .......... .......... .......... .......... .......... 1.23M
    50K .......... .......... .......... .......... .......... 2.79M
   100K .......... .......... .......... .......... .......... 2.88M
   150K .......... .......... .......... .......... .......... 2.72M
   200K .......... .......... .......... .......... .......... 3.79M
   250K .......... .......... .......... .......... .......... 3.11M
   300K .......... .......... .......... .......... .......... 3.30M
   350K .......... .......... .......... .......... .......... 3.02M
   400K .......... .......... .......... .......... .......... 3.75M
   450K .......... .......... .......... .......... .......... 4.69M
   500K .......... .......... .......... .......... .......... 3.50M
   550K .......... .......... .......... .......... .......... 3.64M
   600K .......... .......... .......... .......... .......... 4.86M
   650K .......... .......... .......... .......... .......... 4.01M
   700K .......... .......... .......... .......... .......... 5.28M
   750K .......... .......... .......... .......... .......... 3.93M
   800K .......... .......... .......... .......... .......... 4.48M
   850K .......... .......... .......... .......... .......... 4.95M
   900K .......... .......... .......... .......... .......... 4.80M
   950K .......... .......... .......... .......... .......... 5.27M
  1000K .......... .......... .......... .......... .......... 5.10M
  1050K .......... .......... .......... .......... .......... 5.59M
  1100K .......... .......... .......... .......... .......... 5.48M
  1150K .......... .......... .......... .......... .......... 5.24M
  1200K .......... .......... .......... .......... .......... 5.40M
  1250K .......... .......... .......... .......... .......... 5.69M
  1300K .......... .......... .......... .......... .......... 5.11M
  1350K .......... .......... .......... .......... .......... 6.30M
  1400K .......... .......... .......... .......... .......... 4.97M
  1450K .......... .......... .......... .......... .......... 7.48M
  1500K .......... .......... .......... .......... .......... 8.73M
  1550K .......... .......... .......... .......... .......... 4.89M
  1600K .......... .......... .......... .......... .......... 7.32M
  1650K .......... .......... .......... .......... .......... 5.84M
  1700K .......... .......... .......... .......... .......... 8.32M
  1750K .......... .......... .......... .......... .......... 7.74M
  1800K .......... .......... .......... .......... .......... 5.71M
  1850K .......... .......... .......... .......... .......... 9.01M
  1900K .......... .......... .......... .......... .......... 7.35M
  1950K .......... .......... .......... .......... .......... 6.41M
  2000K .......... .......... .......... .......... .......... 8.13M
  2050K .......... .......... .......... .......... .......... 10.5M
  2100K .......... .......... .......... .......... .......... 5.47M
  2150K .......... .......... .......... .......... .......... 9.04M
  2200K .......... .......... .......... .......... .......... 6.53M
  2250K .......... .......... .......... .......... .......... 10.6M
  2300K .......... .......... .......... .......... .......... 8.93M
  2350K .......... .......... .......... .......... .......... 5.99M
  2400K .......... .......... .......... .......... .......... 13.1M
  2450K .......... .......... .......... .......... .......... 10.6M
  2500K .......... .......... .......... .......... .......... 9.44M
  2550K .......... .......... .......... .......... .......... 6.36M
  2600K .......... .......... .......... .......... .......... 12.4M
  2650K .......... .......... .......... .......... .......... 9.65M
  2700K .......... .......... .......... .......... .......... 2.68M
  2750K .......... .......... .......... .......... .......... 13.4M
  2800K .......... .......... .......... .......... .......... 16.5M
  2850K .......... .......... .......... .......... .......... 24.9M
  2900K .......... .......... .......... .......... .......... 4.71M
  2950K .......... .......... .......... .......... .......... 12.2M
  3000K .......... .......... .......... .......... .......... 12.5M
  3050K .......... .......... .......... .......... .......... 11.6M
  3100K .......... .......... .......... .......... .......... 7.08M
  3150K .......... .......... .......... .......... .......... 11.0M
  3200K .......... .......... .......... .......... .......... 15.9M
  3250K .......... .......... .......... .......... .......... 11.6M
  3300K .......... .......... .......... .......... .......... 7.41M
  3350K .......... .......... .......... .......... .......... 13.2M
  3400K .......... .......... .......... .......... .......... 13.5M
  3450K .......... .......... .......... .......... .......... 11.9M
  3500K .......... .......... .......... .......... .......... 7.33M
  3550K .......... .......... .......... .......... .......... 14.2M
  3600K .......... .......... .......... .......... .......... 17.1M
  3650K .......... .......... .......... .......... .......... 14.6M
  3700K .......... .......... .......... .......... .......... 10.6M
  3750K .......... .......... .......... .......... .......... 8.35M
  3800K .......... .......... .......... .......... .......... 13.2M
  3850K .......... .......... .......... .......... .......... 14.9M
  3900K .......... .......... .......... .......... .......... 12.9M
  3950K .......... .......... .......... .......... .......... 8.15M
  4000K .......... .......... .......... .......... .......... 15.2M
  4050K .......... .......... .......... .......... .......... 15.5M
  4100K .......... .......... .......... .......... .......... 15.6M
  4150K .......... .......... .......... .......... .......... 13.6M
  4200K .......... .......... .......... .......... .......... 8.86M
  4250K .......... .......... .......... .......... .......... 12.3M
  4300K .......... .......... .......... .......... .......... 16.8M
  4350K .......... .......... .......... .......... .......... 17.4M
  4400K .......... .......... .......... .......... .......... 14.9M
  4450K .......... .......... .......... .......... .......... 8.28M
  4500K .......... .......... .......... .......... .......... 16.9M
  4550K .......... .......... .......... .......... .......... 16.5M
  4600K .......... .......... .......... .......... .......... 20.1M
  4650K .......... .......... .......... .......... .......... 12.2M
  4700K .......... .......... .......... .......... .......... 6.99M
  4750K .......... .......... .......... .......... .......... 27.3M
  4800K .......... .......... .......... .......... .......... 24.5M
  4850K .......... .......... .......... .......... .......... 10.7M
  4900K .......... .......... .......... .......... .......... 18.1M
  4950K .......... .......... .......... .......... .......... 22.1M
  5000K .......... .......... .......... .......... ..........  155K
  5050K .......... .......... .......... .......... .......... 15.3M
  5100K .......... .......... .......... .......... .......... 3.96M
  5150K .......... .......... .......... .......... .......... 12.0M
  5200K .......... .......... .......... .......... .......... 11.5M
  5250K .......... .......... .......... .......... .......... 12.9M
  5300K .......... .......... .......... .......... .......... 13.8M
  5350K .......... .......... .......... .......... .......... 9.31M
  5400K .......... .......... .......... .......... .......... 12.3M
  5450K .......... .......... .......... .......... .......... 10.9M
  5500K .......... .......... .......... .......... .......... 13.8M
  5550K .......... .......... .......... .......... .......... 11.5M
  5600K .......... .......... .......... .......... .......... 9.91M
  5650K .......... .......... .......... .......... .......... 13.7M
  5700K .......... .......... .......... .......... .......... 11.9M
  5750K .......... .......... .......... .......... .......... 19.8M
  5800K .......... .......... .......... .......... .......... 8.41M
  5850K .......... .......... .......... .......... .......... 13.2M
  5900K .......... .......... .......... .......... .......... 13.9M
  5950K .......... .......... .......... .......... .......... 14.9M
  6000K .......... .......... .......... .......... .......... 9.74M
  6050K .......... .......... .......... .......... .......... 10.4M
  6100K .......... .......... .......... .......... .......... 12.1M
  6150K .......... .......... .......... .......... .......... 10.5M
  6200K .......... .......... .......... .......... .......... 9.44M
  6250K .......... .......... .......... .......... .......... 9.99M
  6300K .......... .......... .......... .......... .......... 11.4M
  6350K .......... .......... .......... .......... .......... 11.3M
  6400K .......... .......... .......... .......... .......... 9.48M
  6450K .......... .......... .......... .......... .......... 11.2M
  6500K .......... .......... .......... .......... .......... 9.62M
  6550K .......... .......... .......... .......... .......... 11.3M
  6600K .......... .......... .......... .......... .......... 14.9M
  6650K .......... .......... .......... .......... .......... 14.6M
  6700K .......... .......... .......... .......... .......... 12.0M
  6750K .......... .......... .......... .......... .......... 10.8M
  6800K .......... .......... .......... .......... .......... 13.2M
  6850K .......... .......... .......... .......... .......... 17.8M
  6900K .......... .......... .......... .......... .......... 17.9M
  6950K .......... .......... .......... .......... .......... 17.0M
  7000K .......... .......... .......... .......... ..........  397K
  7050K .......... .......... .......... .......... .......... 18.6M
  7100K .......... .......... .......... .......... .......... 17.7M
  7150K .......... .......... .......... .......... .......... 18.6M
  7200K .......... .......... .......... .......... .......... 14.0M
  7250K .......... .......... .......... .......... .......... 7.73M
  7300K .......... .......... .......... .......... .......... 16.9M
  7350K .......... .......... .......... .......... .......... 24.0M
  7400K .......... .......... .......... .......... .......... 14.7M
  7450K .......... .......... .......... .......... .......... 19.6M
  7500K .......... .......... .......... .......... .......... 20.3M
  7550K .......... .......... .......... .......... .......... 8.28M
  7600K .......... .......... .......... .......... .......... 15.2M
  7650K .......... .......... .......... .......... .......... 27.7M
  7700K .......... .......... .......... .......... .......... 18.5M
  7750K .......... .......... .......... .......... .......... 19.0M
  7800K .......... .......... .......... .......... .......... 19.6M
  7850K .......... .......... .......... .......... .......... 8.47M
  7900K .......... .......... .......... .......... .......... 14.9M
  7950K .......... .......... .......... .......... .......... 27.5M
  8000K .......... .......... .......... .......... .......... 16.9M
  8050K .......... .......... .......... .......... .......... 19.6M
  8100K .......... .......... .......... .......... .......... 26.3M
  8150K .......... .......... .......... .......... .......... 3.38M
  8200K .......... .......... .......... .......... .......... 58.7M
  8250K .......... .......... .......... .......... .......... 89.1M
  8300K .......... .......... .......... .......... .......... 69.0M
  8350K .......... .......... .......... .......... .......... 63.6M
  8400K .......... .......... .......... .......... .......... 28.1M
  8450K .......... ...                                         46.9M=1.4s

2022-02-01 14:26:13 (5.79 MB/s) - ‘UppsalaCentrumWgot.osm’ saved [8667122]
pwd
ls
/databricks/driver
conf
derby.log
eventlogs
ganglia
logs
display(dbutils.fs.ls("dbfs:///datasets/"))
path name size
dbfs:/datasets/alexandria/ alexandria/ 0.0
dbfs:/datasets/beijing/ beijing/ 0.0
dbfs:/datasets/magellan/ magellan/ 0.0
dbfs:/datasets/maps/ maps/ 0.0
dbfs:/datasets/mobile_sample/ mobile_sample/ 0.0
dbfs:/datasets/osm/ osm/ 0.0
dbfs:/datasets/sou/ sou/ 0.0
dbfs:/datasets/t-drive-trips/ t-drive-trips/ 0.0
dbfs:/datasets/t-drive-trips-magellan/ t-drive-trips-magellan/ 0.0
dbfs:/datasets/taxis/ taxis/ 0.0
// making directory in distributed file system
dbutils.fs.mkdirs("dbfs:///datasets/maps/")
res1: Boolean = true
display(dbutils.fs.ls("dbfs:///datasets/maps/"))
path name size
dbfs:/datasets/maps/StockholmCentrumWgot.osm StockholmCentrumWgot.osm 3820982.0
dbfs:/datasets/maps/TinyUppsalaCentrumWgot.osm TinyUppsalaCentrumWgot.osm 919097.0
dbfs:/datasets/maps/UppsalaCentrumWgot.osm UppsalaCentrumWgot.osm 8667122.0
// copy file from local fs to dbfs
dbutils.fs.cp("file:///databricks/driver/UppsalaCentrumWgot.osm","dbfs:///datasets/maps/")
res4: Boolean = true
display(dbutils.fs.ls("dbfs:///datasets/maps/"))
path name size
dbfs:/datasets/maps/StockholmCentrumWgot.osm StockholmCentrumWgot.osm 3820982.0
dbfs:/datasets/maps/TinyUppsalaCentrumWgot.osm TinyUppsalaCentrumWgot.osm 919097.0
dbfs:/datasets/maps/UppsalaCentrumWgot.osm UppsalaCentrumWgot.osm 8667122.0
//Read the data from dbfs
val path = "dbfs:/datasets/maps/UppsalaCentrumWgot.osm"
val uppsalaCentrumOsmDF = spark.read
      .format("magellan")
      .option("type", "osm")
      .load(path)
path: String = dbfs:/datasets/maps/UppsalaCentrumWgot.osm
uppsalaCentrumOsmDF: org.apache.spark.sql.DataFrame = [point: point, polyline: polyline ... 3 more fields]
uppsalaCentrumOsmDF.show()
+-----+--------------------+--------------------+--------------------+-----+
|point|            polyline|             polygon|            metadata|valid|
+-----+--------------------+--------------------+--------------------+-----+
| null|magellan.PolyLine...|                null|[electrified -> c...| true|
| null|magellan.PolyLine...|                null|                  []| true|
| null|magellan.PolyLine...|                null|[electrified -> c...| true|
| null|magellan.PolyLine...|                null|[electrified -> c...| true|
| null|magellan.PolyLine...|                null|[electrified -> c...| true|
| null|magellan.PolyLine...|                null|[electrified -> c...| true|
| null|magellan.PolyLine...|                null|[electrified -> c...| true|
| null|magellan.PolyLine...|                null|  [landuse -> grass]| true|
| null|                null|magellan.Polygon@...|                  []| true|
| null|                null|magellan.Polygon@...|[natural -> grass...| true|
| null|                null|magellan.Polygon@...|  [landuse -> grass]| true|
| null|                null|magellan.Polygon@...|  [landuse -> grass]| true|
| null|                null|magellan.Polygon@...|  [landuse -> grass]| true|
| null|                null|magellan.Polygon@...|   [leisure -> park]| true|
| null|magellan.PolyLine...|                null|[highway -> cycle...| true|
| null|magellan.PolyLine...|                null|[leisure -> park,...| true|
| null|magellan.PolyLine...|                null|[bicycle -> desig...| true|
| null|magellan.PolyLine...|                null|[name -> Bolandgy...| true|
| null|                null|magellan.Polygon@...|   [building -> yes]| true|
| null|                null|magellan.Polygon@...|[electrified -> n...| true|
+-----+--------------------+--------------------+--------------------+-----+
only showing top 20 rows
display(uppsalaCentrumOsmDF)
uppsalaCentrumOsmDF.count
res9: Long = 32112
wget -O TinyUppsalaCentrumWgot.osm "https://api.openstreetmap.org/api/0.6/map?bbox=17.63514,59.85739,17.64154,59.86011"
--2022-02-02 08:56:29--  https://api.openstreetmap.org/api/0.6/map?bbox=17.63514,59.85739,17.64154,59.86011
Resolving api.openstreetmap.org (api.openstreetmap.org)... 130.117.76.13, 130.117.76.11, 130.117.76.12, ...
Connecting to api.openstreetmap.org (api.openstreetmap.org)|130.117.76.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]
Saving to: ‘TinyUppsalaCentrumWgot.osm’

     0K .......... .......... .......... .......... .......... 1.32M
    50K .......... .......... .......... .......... .......... 3.11M
   100K .......... .......... .......... .......... .......... 3.17M
   150K .......... .......... .......... .......... .......... 2.84M
   200K .......... .......... .......... .......... .......... 4.22M
   250K .......... .......... .......... .......... .......... 3.36M
   300K .......... .......... .......... .......... .......... 3.69M
   350K .......... .......... .......... .......... .......... 3.11M
   400K .......... .......... .......... .......... .......... 4.36M
   450K .......... .......... .......... .......... .......... 4.32M
   500K .......... .......... .......... .......... .......... 4.20M
   550K .......... .......... .......... .......... .......... 4.21M
   600K .......... .......... .......... .......... .......... 4.49M
   650K .......... .......... .......... .......... .......... 4.55M
   700K .......... .......... .......... .......... .......... 4.71M
   750K .......... .......... .......... .......... .......... 4.48M
   800K .......... .......... .......... .......... .......... 5.36M
   850K .......... .......... .......... .......... .......    5.78M=0.2s

2022-02-02 08:56:29 (3.56 MB/s) - ‘TinyUppsalaCentrumWgot.osm’ saved [919096]
pwd
ls
/databricks/driver
TinyUppsalaCentrumWgot.osm
conf
derby.log
eventlogs
ganglia
logs
// copy file from local fs to dbfs
dbutils.fs.cp("file:///databricks/driver/TinyUppsalaCentrumWgot.osm","dbfs:///datasets/maps/")
display(dbutils.fs.ls("dbfs:///datasets/maps/"))
path name size
dbfs:/datasets/maps/StockholmCentrumWgot.osm StockholmCentrumWgot.osm 3820982.0
dbfs:/datasets/maps/TinyUppsalaCentrumWgot.osm TinyUppsalaCentrumWgot.osm 919096.0
dbfs:/datasets/maps/UppsalaCentrumWgot.osm UppsalaCentrumWgot.osm 8667122.0
//read the file from dbfs
val path = "dbfs:/datasets/maps/TinyUppsalaCentrumWgot.osm"
val tinyUppsalaCentrumOsmDF = spark.read
      .format("magellan")
      .option("type", "osm")
      .load(path)
display(tinyUppsalaCentrumOsmDF)
tinyUppsalaCentrumOsmDF.count
res12: Long = 1857

Setting up leaflet

You need to go to the following URL and set-up access-token in map-box to use leaflet independently:

  • https://leafletjs.com/examples/quick-start/
  • Request access-token:
    • https://account.mapbox.com/auth/signin/?route-to=%22/access-tokens/%22

Visualise with leaflet:

Take an array of Strings in 'GeoJson' format, then insert this into a prebuild html string that contains all the code neccesary to display these features using Leaflet. The resulting html can be displayed in DataBricks using the displayHTML function.

See http://leafletjs.com/examples/geojson.html for a detailed example of using GeoJson with Leaflet.

//val point1 = sc.parallelize(Seq((59.839264, 17.647075),(59.9, 17.88))).toDF("x", "y")
val point1 = sc.parallelize(Seq((59.839264, 17.647075))).toDF("x", "y")
val point1c = point1.collect()
val string2 = point1c.mkString(",")
//df.select(columns: _*).collect.map(_.toSeq)
val string22 = "'random_string'"
point1: org.apache.spark.sql.DataFrame = [x: double, y: double]
point1c: Array[org.apache.spark.sql.Row] = Array([59.839264,17.647075])
string2: String = [59.839264,17.647075]
string22: String = 'random_string'
def genLeafletHTML(): String = {
  val accessToken = "pk.eyJ1IjoiZHRnIiwiYSI6ImNpaWF6MGdiNDAwanNtemx6MmIyNXoyOWIifQ.ndbNtExCMXZHKyfNtEN0Vg"

  val generatedHTML = f"""<!DOCTYPE html>  
  <html>
  <head>
        <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.css">
        <style>#map {width: 600px; height:400px;}</style>
  </head>
  
  <body>
      <div id="map" style="width: 600px; height: 400px"></div>
      <script src="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.js"></script>
      <script type="text/javascript">
          var map = L.map('map').setView([59.838, 17.646865], 16);

          L.tileLayer('https://api.tiles.mapbox.com/v4/{id}/{z}/{x}/{y}.png?access_token=$accessToken', {
            maxZoom: 19
            , id: 'mapbox.streets'
            , attribution: '<a href="http://openstreetmap.org">OpenStreetMap</a> ' +
              '<a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a> ' +
              '| &copy; <a href="http://mapbox.com">Mapbox</a>'
          }).addTo(map);
          
          str1 = 'SDS<br>Ångströmlaboratoriet<br>59.839264, 17.647075<br>';
          str2 = ${string22};
          var popup = str1.concat(str2);
          
          L.marker(${string2}).addTo(map)
              .bindPopup(popup)
              .openPopup();

      </script>
  </body>  
  """
  generatedHTML
}

displayHTML(genLeafletHTML)

Specifying the Time frame we are interested in.

val startTime: Timestamp = Timestamp.valueOf("2008-02-03 00:00:00.0")
val endTime: Timestamp = Timestamp.valueOf("2008-02-03 01:00:00.0")
startTime: java.sql.Timestamp = 2008-02-03 00:00:00.0
endTime: java.sql.Timestamp = 2008-02-03 01:00:00.0

Now the getIntersectingTrips function can be run and the data points that intersect the space time volume are found.

val intersectingTrips = polygonDF.getIntersectingTrips(taxiDataSparkParquetRead, startTime, endTime) // taxiData
intersectingTrips: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [polygon: polygon, taxiId: int ... 2 more fields]

Here are all the taxi ids that pass through the polygon:

display(intersectingTrips.select($"taxiId", $"timeStamp"))
taxiId timeStamp
6568.0 2008-02-03T00:09:05.000+0000
4912.0 2008-02-03T00:23:17.000+0000
4566.0 2008-02-03T00:07:54.000+0000
7989.0 2008-02-03T00:33:51.000+0000
3911.0 2008-02-03T00:07:56.000+0000
9231.0 2008-02-03T00:20:51.000+0000
2751.0 2008-02-03T00:44:21.000+0000
3390.0 2008-02-03T00:40:08.000+0000
1242.0 2008-02-03T00:03:38.000+0000
8177.0 2008-02-03T00:20:26.000+0000
8528.0 2008-02-03T00:20:57.000+0000
1606.0 2008-02-03T00:45:28.000+0000
2917.0 2008-02-03T00:28:27.000+0000
4912.0 2008-02-03T00:23:17.000+0000

A list of all the taxis that take a trip around the square:

display(intersectingTrips.select($"taxiId").distinct)
taxiId
7989.0
3390.0
9231.0
1242.0
6568.0
8177.0
4912.0
2751.0
8528.0
3911.0
4566.0
2917.0
1606.0
display(intersectingTrips.groupBy($"taxiId").count.orderBy(-$"count"))
taxiId count
4912.0 2.0
7989.0 1.0
3390.0 1.0
9231.0 1.0
1242.0 1.0
6568.0 1.0
8177.0 1.0
2751.0 1.0
8528.0 1.0
3911.0 1.0
4566.0 1.0
2917.0 1.0
1606.0 1.0

Cleanup your mess in distributed RAM

taxiDataSparkParquetRead.unpersist()
res29: taxiDataSparkParquetRead.type = [taxiId: int, timeStamp: timestamp ... 1 more field]
taxiDataSpark.unpersist()
res30: taxiDataSpark.type = [taxiId: int, timeStamp: timestamp ... 1 more field]
taxiRepartioned.unpersist()
res31: taxiRepartioned.type = MapPartitionsRDD[95] at repartition at command-2971213210274715:1
taxiData.unpersist()
res32: taxiData.type = [taxiId: int, timeStamp: timestamp ... 1 more field]
ls dbfs:/datasets/t-drive-trips
path name size
dbfs:/datasets/t-drive-trips/_SUCCESS _SUCCESS 0.0
dbfs:/datasets/t-drive-trips/_committed_3926031913428555637 _committed_3926031913428555637 10024.0
dbfs:/datasets/t-drive-trips/_committed_448783018784947015 _committed_448783018784947015 19934.0
dbfs:/datasets/t-drive-trips/_committed_vacuum5877992330899363965 _committed_vacuum5877992330899363965 96.0
dbfs:/datasets/t-drive-trips/_started_448783018784947015 _started_448783018784947015 0.0
dbfs:/datasets/t-drive-trips/part-00000-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-241-1-c000.snappy.parquet part-00000-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-241-1-c000.snappy.parquet 3316133.0
dbfs:/datasets/t-drive-trips/part-00001-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-242-1-c000.snappy.parquet part-00001-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-242-1-c000.snappy.parquet 3323394.0
dbfs:/datasets/t-drive-trips/part-00002-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-243-1-c000.snappy.parquet part-00002-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-243-1-c000.snappy.parquet 3309944.0
dbfs:/datasets/t-drive-trips/part-00003-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-244-1-c000.snappy.parquet part-00003-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-244-1-c000.snappy.parquet 3316166.0
dbfs:/datasets/t-drive-trips/part-00004-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-245-1-c000.snappy.parquet part-00004-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-245-1-c000.snappy.parquet 3307819.0
dbfs:/datasets/t-drive-trips/part-00005-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-246-1-c000.snappy.parquet part-00005-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-246-1-c000.snappy.parquet 3317356.0
dbfs:/datasets/t-drive-trips/part-00006-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-247-1-c000.snappy.parquet part-00006-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-247-1-c000.snappy.parquet 3322203.0
dbfs:/datasets/t-drive-trips/part-00007-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-248-1-c000.snappy.parquet part-00007-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-248-1-c000.snappy.parquet 3328137.0
dbfs:/datasets/t-drive-trips/part-00008-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-249-1-c000.snappy.parquet part-00008-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-249-1-c000.snappy.parquet 3320837.0
dbfs:/datasets/t-drive-trips/part-00009-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-250-1-c000.snappy.parquet part-00009-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-250-1-c000.snappy.parquet 3329892.0
dbfs:/datasets/t-drive-trips/part-00010-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-251-1-c000.snappy.parquet part-00010-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-251-1-c000.snappy.parquet 3324941.0
dbfs:/datasets/t-drive-trips/part-00011-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-252-1-c000.snappy.parquet part-00011-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-252-1-c000.snappy.parquet 3321528.0
dbfs:/datasets/t-drive-trips/part-00012-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-253-1-c000.snappy.parquet part-00012-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-253-1-c000.snappy.parquet 3328393.0
dbfs:/datasets/t-drive-trips/part-00013-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-254-1-c000.snappy.parquet part-00013-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-254-1-c000.snappy.parquet 3314838.0
dbfs:/datasets/t-drive-trips/part-00014-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-255-1-c000.snappy.parquet part-00014-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-255-1-c000.snappy.parquet 3312383.0
dbfs:/datasets/t-drive-trips/part-00015-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-256-1-c000.snappy.parquet part-00015-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-256-1-c000.snappy.parquet 3317943.0
dbfs:/datasets/t-drive-trips/part-00016-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-257-1-c000.snappy.parquet part-00016-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-257-1-c000.snappy.parquet 3312259.0
dbfs:/datasets/t-drive-trips/part-00017-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-258-1-c000.snappy.parquet part-00017-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-258-1-c000.snappy.parquet 3326403.0
dbfs:/datasets/t-drive-trips/part-00018-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-259-1-c000.snappy.parquet part-00018-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-259-1-c000.snappy.parquet 3316396.0
dbfs:/datasets/t-drive-trips/part-00019-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-260-1-c000.snappy.parquet part-00019-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-260-1-c000.snappy.parquet 3334055.0
dbfs:/datasets/t-drive-trips/part-00020-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-261-1-c000.snappy.parquet part-00020-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-261-1-c000.snappy.parquet 3315604.0
dbfs:/datasets/t-drive-trips/part-00021-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-262-1-c000.snappy.parquet part-00021-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-262-1-c000.snappy.parquet 3322431.0
dbfs:/datasets/t-drive-trips/part-00022-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-263-1-c000.snappy.parquet part-00022-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-263-1-c000.snappy.parquet 3327427.0
dbfs:/datasets/t-drive-trips/part-00023-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-264-1-c000.snappy.parquet part-00023-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-264-1-c000.snappy.parquet 3309770.0
dbfs:/datasets/t-drive-trips/part-00024-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-265-1-c000.snappy.parquet part-00024-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-265-1-c000.snappy.parquet 3322627.0
dbfs:/datasets/t-drive-trips/part-00025-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-266-1-c000.snappy.parquet part-00025-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-266-1-c000.snappy.parquet 3325132.0
dbfs:/datasets/t-drive-trips/part-00026-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-267-1-c000.snappy.parquet part-00026-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-267-1-c000.snappy.parquet 3313093.0
dbfs:/datasets/t-drive-trips/part-00027-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-268-1-c000.snappy.parquet part-00027-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-268-1-c000.snappy.parquet 3316395.0
dbfs:/datasets/t-drive-trips/part-00028-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-269-1-c000.snappy.parquet part-00028-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-269-1-c000.snappy.parquet 3323660.0
dbfs:/datasets/t-drive-trips/part-00029-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-270-1-c000.snappy.parquet part-00029-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-270-1-c000.snappy.parquet 3337843.0
dbfs:/datasets/t-drive-trips/part-00030-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-271-1-c000.snappy.parquet part-00030-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-271-1-c000.snappy.parquet 3331530.0
dbfs:/datasets/t-drive-trips/part-00031-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-272-1-c000.snappy.parquet part-00031-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-272-1-c000.snappy.parquet 3335209.0
dbfs:/datasets/t-drive-trips/part-00032-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-273-1-c000.snappy.parquet part-00032-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-273-1-c000.snappy.parquet 3336128.0
dbfs:/datasets/t-drive-trips/part-00033-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-274-1-c000.snappy.parquet part-00033-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-274-1-c000.snappy.parquet 3342654.0
dbfs:/datasets/t-drive-trips/part-00034-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-275-1-c000.snappy.parquet part-00034-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-275-1-c000.snappy.parquet 3316266.0
dbfs:/datasets/t-drive-trips/part-00035-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-276-1-c000.snappy.parquet part-00035-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-276-1-c000.snappy.parquet 3317877.0
dbfs:/datasets/t-drive-trips/part-00036-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-277-1-c000.snappy.parquet part-00036-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-277-1-c000.snappy.parquet 3322694.0
dbfs:/datasets/t-drive-trips/part-00037-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-278-1-c000.snappy.parquet part-00037-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-278-1-c000.snappy.parquet 3330480.0
dbfs:/datasets/t-drive-trips/part-00038-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-279-1-c000.snappy.parquet part-00038-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-279-1-c000.snappy.parquet 3312271.0
dbfs:/datasets/t-drive-trips/part-00039-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-280-1-c000.snappy.parquet part-00039-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-280-1-c000.snappy.parquet 3314031.0
dbfs:/datasets/t-drive-trips/part-00040-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-281-1-c000.snappy.parquet part-00040-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-281-1-c000.snappy.parquet 3331866.0
dbfs:/datasets/t-drive-trips/part-00041-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-282-1-c000.snappy.parquet part-00041-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-282-1-c000.snappy.parquet 3322115.0
dbfs:/datasets/t-drive-trips/part-00042-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-283-1-c000.snappy.parquet part-00042-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-283-1-c000.snappy.parquet 3326874.0
dbfs:/datasets/t-drive-trips/part-00043-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-284-1-c000.snappy.parquet part-00043-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-284-1-c000.snappy.parquet 3327994.0
dbfs:/datasets/t-drive-trips/part-00044-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-285-1-c000.snappy.parquet part-00044-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-285-1-c000.snappy.parquet 3330087.0
dbfs:/datasets/t-drive-trips/part-00045-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-286-1-c000.snappy.parquet part-00045-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-286-1-c000.snappy.parquet 3328726.0
dbfs:/datasets/t-drive-trips/part-00046-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-287-1-c000.snappy.parquet part-00046-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-287-1-c000.snappy.parquet 3321983.0
dbfs:/datasets/t-drive-trips/part-00047-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-288-1-c000.snappy.parquet part-00047-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-288-1-c000.snappy.parquet 3332147.0
dbfs:/datasets/t-drive-trips/part-00048-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-289-1-c000.snappy.parquet part-00048-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-289-1-c000.snappy.parquet 3332842.0
dbfs:/datasets/t-drive-trips/part-00049-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-290-1-c000.snappy.parquet part-00049-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-290-1-c000.snappy.parquet 3323693.0
dbfs:/datasets/t-drive-trips/part-00050-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-291-1-c000.snappy.parquet part-00050-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-291-1-c000.snappy.parquet 3333414.0
dbfs:/datasets/t-drive-trips/part-00051-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-292-1-c000.snappy.parquet part-00051-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-292-1-c000.snappy.parquet 3303953.0
dbfs:/datasets/t-drive-trips/part-00052-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-293-1-c000.snappy.parquet part-00052-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-293-1-c000.snappy.parquet 3338614.0
dbfs:/datasets/t-drive-trips/part-00053-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-294-1-c000.snappy.parquet part-00053-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-294-1-c000.snappy.parquet 3330205.0
dbfs:/datasets/t-drive-trips/part-00054-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-295-1-c000.snappy.parquet part-00054-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-295-1-c000.snappy.parquet 3306341.0
dbfs:/datasets/t-drive-trips/part-00055-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-296-1-c000.snappy.parquet part-00055-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-296-1-c000.snappy.parquet 3333542.0
dbfs:/datasets/t-drive-trips/part-00056-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-297-1-c000.snappy.parquet part-00056-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-297-1-c000.snappy.parquet 3315821.0
dbfs:/datasets/t-drive-trips/part-00057-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-298-1-c000.snappy.parquet part-00057-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-298-1-c000.snappy.parquet 3332481.0
dbfs:/datasets/t-drive-trips/part-00058-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-299-1-c000.snappy.parquet part-00058-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-299-1-c000.snappy.parquet 3338906.0
dbfs:/datasets/t-drive-trips/part-00059-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-300-1-c000.snappy.parquet part-00059-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-300-1-c000.snappy.parquet 3296655.0
dbfs:/datasets/t-drive-trips/part-00060-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-301-1-c000.snappy.parquet part-00060-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-301-1-c000.snappy.parquet 3324196.0
dbfs:/datasets/t-drive-trips/part-00061-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-302-1-c000.snappy.parquet part-00061-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-302-1-c000.snappy.parquet 3328037.0
dbfs:/datasets/t-drive-trips/part-00062-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-303-1-c000.snappy.parquet part-00062-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-303-1-c000.snappy.parquet 3304713.0
dbfs:/datasets/t-drive-trips/part-00063-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-304-1-c000.snappy.parquet part-00063-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-304-1-c000.snappy.parquet 3322291.0
dbfs:/datasets/t-drive-trips/part-00064-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-305-1-c000.snappy.parquet part-00064-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-305-1-c000.snappy.parquet 3315149.0
dbfs:/datasets/t-drive-trips/part-00065-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-306-1-c000.snappy.parquet part-00065-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-306-1-c000.snappy.parquet 3331060.0
dbfs:/datasets/t-drive-trips/part-00066-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-307-1-c000.snappy.parquet part-00066-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-307-1-c000.snappy.parquet 3319447.0
dbfs:/datasets/t-drive-trips/part-00067-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-308-1-c000.snappy.parquet part-00067-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-308-1-c000.snappy.parquet 3302431.0
dbfs:/datasets/t-drive-trips/part-00068-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-309-1-c000.snappy.parquet part-00068-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-309-1-c000.snappy.parquet 3318678.0
dbfs:/datasets/t-drive-trips/part-00069-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-310-1-c000.snappy.parquet part-00069-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-310-1-c000.snappy.parquet 3310107.0
dbfs:/datasets/t-drive-trips/part-00070-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-311-1-c000.snappy.parquet part-00070-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-311-1-c000.snappy.parquet 3332591.0
dbfs:/datasets/t-drive-trips/part-00071-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-312-1-c000.snappy.parquet part-00071-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-312-1-c000.snappy.parquet 3313772.0
dbfs:/datasets/t-drive-trips/part-00072-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-313-1-c000.snappy.parquet part-00072-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-313-1-c000.snappy.parquet 3317966.0
dbfs:/datasets/t-drive-trips/part-00073-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-314-1-c000.snappy.parquet part-00073-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-314-1-c000.snappy.parquet 3324060.0
dbfs:/datasets/t-drive-trips/part-00074-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-315-1-c000.snappy.parquet part-00074-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-315-1-c000.snappy.parquet 3333476.0
dbfs:/datasets/t-drive-trips/part-00075-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-316-1-c000.snappy.parquet part-00075-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-316-1-c000.snappy.parquet 3303679.0
dbfs:/datasets/t-drive-trips/part-00076-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-317-1-c000.snappy.parquet part-00076-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-317-1-c000.snappy.parquet 3328348.0
dbfs:/datasets/t-drive-trips/part-00077-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-318-1-c000.snappy.parquet part-00077-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-318-1-c000.snappy.parquet 3313956.0
dbfs:/datasets/t-drive-trips/part-00078-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-319-1-c000.snappy.parquet part-00078-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-319-1-c000.snappy.parquet 3312261.0
dbfs:/datasets/t-drive-trips/part-00079-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-320-1-c000.snappy.parquet part-00079-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-320-1-c000.snappy.parquet 3328554.0
dbfs:/datasets/t-drive-trips/part-00080-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-321-1-c000.snappy.parquet part-00080-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-321-1-c000.snappy.parquet 3324701.0
dbfs:/datasets/t-drive-trips/part-00081-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-322-1-c000.snappy.parquet part-00081-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-322-1-c000.snappy.parquet 3320848.0
dbfs:/datasets/t-drive-trips/part-00082-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-323-1-c000.snappy.parquet part-00082-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-323-1-c000.snappy.parquet 3328741.0
dbfs:/datasets/t-drive-trips/part-00083-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-324-1-c000.snappy.parquet part-00083-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-324-1-c000.snappy.parquet 3316403.0
dbfs:/datasets/t-drive-trips/part-00084-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-325-1-c000.snappy.parquet part-00084-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-325-1-c000.snappy.parquet 3315476.0
dbfs:/datasets/t-drive-trips/part-00085-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-326-1-c000.snappy.parquet part-00085-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-326-1-c000.snappy.parquet 3330775.0
dbfs:/datasets/t-drive-trips/part-00086-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-327-1-c000.snappy.parquet part-00086-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-327-1-c000.snappy.parquet 3340227.0
dbfs:/datasets/t-drive-trips/part-00087-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-328-1-c000.snappy.parquet part-00087-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-328-1-c000.snappy.parquet 3307622.0
dbfs:/datasets/t-drive-trips/part-00088-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-329-1-c000.snappy.parquet part-00088-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-329-1-c000.snappy.parquet 3316059.0
dbfs:/datasets/t-drive-trips/part-00089-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-330-1-c000.snappy.parquet part-00089-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-330-1-c000.snappy.parquet 3320127.0
dbfs:/datasets/t-drive-trips/part-00090-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-331-1-c000.snappy.parquet part-00090-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-331-1-c000.snappy.parquet 3326165.0
dbfs:/datasets/t-drive-trips/part-00091-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-332-1-c000.snappy.parquet part-00091-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-332-1-c000.snappy.parquet 3333324.0
dbfs:/datasets/t-drive-trips/part-00092-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-333-1-c000.snappy.parquet part-00092-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-333-1-c000.snappy.parquet 3326087.0
dbfs:/datasets/t-drive-trips/part-00093-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-334-1-c000.snappy.parquet part-00093-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-334-1-c000.snappy.parquet 3305959.0
dbfs:/datasets/t-drive-trips/part-00094-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-335-1-c000.snappy.parquet part-00094-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-335-1-c000.snappy.parquet 3321270.0
dbfs:/datasets/t-drive-trips/part-00095-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-336-1-c000.snappy.parquet part-00095-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-336-1-c000.snappy.parquet 3309747.0
dbfs:/datasets/t-drive-trips/part-00096-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-337-1-c000.snappy.parquet part-00096-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-337-1-c000.snappy.parquet 3329141.0
dbfs:/datasets/t-drive-trips/part-00097-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-338-1-c000.snappy.parquet part-00097-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-338-1-c000.snappy.parquet 3328356.0
dbfs:/datasets/t-drive-trips/part-00098-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-339-1-c000.snappy.parquet part-00098-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-339-1-c000.snappy.parquet 3315952.0
dbfs:/datasets/t-drive-trips/part-00099-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-340-1-c000.snappy.parquet part-00099-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-340-1-c000.snappy.parquet 3323318.0
ls dbfs:/datasets/t-drive-trips-magellan
path name size
dbfs:/datasets/t-drive-trips-magellan/_committed_3681160352079832245 _committed_3681160352079832245 10024.0
dbfs:/datasets/t-drive-trips-magellan/_committed_5210346527157697119 _committed_5210346527157697119 19634.0
dbfs:/datasets/t-drive-trips-magellan/_committed_5745945769935478881 _committed_5745945769935478881 19623.0
dbfs:/datasets/t-drive-trips-magellan/_committed_6691263342585570063 _committed_6691263342585570063 19623.0
dbfs:/datasets/t-drive-trips-magellan/_committed_vacuum5880288197393177303 _committed_vacuum5880288197393177303 129.0
dbfs:/datasets/t-drive-trips-magellan/_started_5745945769935478881 _started_5745945769935478881 0.0
dbfs:/datasets/t-drive-trips-magellan/_started_6691263342585570063 _started_6691263342585570063 0.0
dbfs:/datasets/t-drive-trips-magellan/part-00000-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-745-1-c000.snappy.parquet part-00000-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-745-1-c000.snappy.parquet 6116439.0
dbfs:/datasets/t-drive-trips-magellan/part-00001-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-747-1-c000.snappy.parquet part-00001-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-747-1-c000.snappy.parquet 6109529.0
dbfs:/datasets/t-drive-trips-magellan/part-00002-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-749-1-c000.snappy.parquet part-00002-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-749-1-c000.snappy.parquet 6127623.0
dbfs:/datasets/t-drive-trips-magellan/part-00003-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-750-1-c000.snappy.parquet part-00003-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-750-1-c000.snappy.parquet 6075264.0
dbfs:/datasets/t-drive-trips-magellan/part-00004-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-751-1-c000.snappy.parquet part-00004-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-751-1-c000.snappy.parquet 6121668.0
dbfs:/datasets/t-drive-trips-magellan/part-00005-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-752-1-c000.snappy.parquet part-00005-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-752-1-c000.snappy.parquet 6160575.0
dbfs:/datasets/t-drive-trips-magellan/part-00006-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-753-1-c000.snappy.parquet part-00006-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-753-1-c000.snappy.parquet 6128421.0
dbfs:/datasets/t-drive-trips-magellan/part-00007-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-754-1-c000.snappy.parquet part-00007-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-754-1-c000.snappy.parquet 6094968.0
dbfs:/datasets/t-drive-trips-magellan/part-00008-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-755-1-c000.snappy.parquet part-00008-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-755-1-c000.snappy.parquet 6177364.0
dbfs:/datasets/t-drive-trips-magellan/part-00009-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-756-1-c000.snappy.parquet part-00009-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-756-1-c000.snappy.parquet 6156075.0
dbfs:/datasets/t-drive-trips-magellan/part-00010-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-757-1-c000.snappy.parquet part-00010-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-757-1-c000.snappy.parquet 6128188.0
dbfs:/datasets/t-drive-trips-magellan/part-00011-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-758-1-c000.snappy.parquet part-00011-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-758-1-c000.snappy.parquet 6087318.0
dbfs:/datasets/t-drive-trips-magellan/part-00012-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-759-1-c000.snappy.parquet part-00012-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-759-1-c000.snappy.parquet 6163969.0
dbfs:/datasets/t-drive-trips-magellan/part-00013-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-760-1-c000.snappy.parquet part-00013-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-760-1-c000.snappy.parquet 6191786.0
dbfs:/datasets/t-drive-trips-magellan/part-00014-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-761-1-c000.snappy.parquet part-00014-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-761-1-c000.snappy.parquet 6100593.0
dbfs:/datasets/t-drive-trips-magellan/part-00015-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-762-1-c000.snappy.parquet part-00015-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-762-1-c000.snappy.parquet 6143283.0
dbfs:/datasets/t-drive-trips-magellan/part-00016-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-763-1-c000.snappy.parquet part-00016-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-763-1-c000.snappy.parquet 6179004.0
dbfs:/datasets/t-drive-trips-magellan/part-00017-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-766-1-c000.snappy.parquet part-00017-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-766-1-c000.snappy.parquet 6109483.0
dbfs:/datasets/t-drive-trips-magellan/part-00018-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-767-1-c000.snappy.parquet part-00018-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-767-1-c000.snappy.parquet 6091116.0
dbfs:/datasets/t-drive-trips-magellan/part-00019-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-768-1-c000.snappy.parquet part-00019-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-768-1-c000.snappy.parquet 6175989.0
dbfs:/datasets/t-drive-trips-magellan/part-00020-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-770-1-c000.snappy.parquet part-00020-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-770-1-c000.snappy.parquet 6164017.0
dbfs:/datasets/t-drive-trips-magellan/part-00021-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-772-1-c000.snappy.parquet part-00021-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-772-1-c000.snappy.parquet 6086937.0
dbfs:/datasets/t-drive-trips-magellan/part-00022-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-773-1-c000.snappy.parquet part-00022-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-773-1-c000.snappy.parquet 6121136.0
dbfs:/datasets/t-drive-trips-magellan/part-00023-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-776-1-c000.snappy.parquet part-00023-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-776-1-c000.snappy.parquet 6113180.0
dbfs:/datasets/t-drive-trips-magellan/part-00024-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-777-1-c000.snappy.parquet part-00024-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-777-1-c000.snappy.parquet 6141102.0
dbfs:/datasets/t-drive-trips-magellan/part-00025-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-778-1-c000.snappy.parquet part-00025-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-778-1-c000.snappy.parquet 6107475.0
dbfs:/datasets/t-drive-trips-magellan/part-00026-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-779-1-c000.snappy.parquet part-00026-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-779-1-c000.snappy.parquet 6108195.0
dbfs:/datasets/t-drive-trips-magellan/part-00027-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-780-1-c000.snappy.parquet part-00027-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-780-1-c000.snappy.parquet 6145437.0
dbfs:/datasets/t-drive-trips-magellan/part-00028-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-781-1-c000.snappy.parquet part-00028-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-781-1-c000.snappy.parquet 6108490.0
dbfs:/datasets/t-drive-trips-magellan/part-00029-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-782-1-c000.snappy.parquet part-00029-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-782-1-c000.snappy.parquet 6172917.0
dbfs:/datasets/t-drive-trips-magellan/part-00030-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-783-1-c000.snappy.parquet part-00030-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-783-1-c000.snappy.parquet 6162200.0
dbfs:/datasets/t-drive-trips-magellan/part-00031-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-784-1-c000.snappy.parquet part-00031-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-784-1-c000.snappy.parquet 6034541.0
dbfs:/datasets/t-drive-trips-magellan/part-00032-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-785-1-c000.snappy.parquet part-00032-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-785-1-c000.snappy.parquet 6178715.0
dbfs:/datasets/t-drive-trips-magellan/part-00033-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-786-1-c000.snappy.parquet part-00033-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-786-1-c000.snappy.parquet 6045366.0
dbfs:/datasets/t-drive-trips-magellan/part-00034-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-787-1-c000.snappy.parquet part-00034-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-787-1-c000.snappy.parquet 6055861.0
dbfs:/datasets/t-drive-trips-magellan/part-00035-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-788-1-c000.snappy.parquet part-00035-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-788-1-c000.snappy.parquet 6102537.0
dbfs:/datasets/t-drive-trips-magellan/part-00036-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-789-1-c000.snappy.parquet part-00036-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-789-1-c000.snappy.parquet 6146001.0
dbfs:/datasets/t-drive-trips-magellan/part-00037-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-790-1-c000.snappy.parquet part-00037-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-790-1-c000.snappy.parquet 6115954.0
dbfs:/datasets/t-drive-trips-magellan/part-00038-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-791-1-c000.snappy.parquet part-00038-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-791-1-c000.snappy.parquet 6189674.0
dbfs:/datasets/t-drive-trips-magellan/part-00039-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-792-1-c000.snappy.parquet part-00039-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-792-1-c000.snappy.parquet 6125360.0
dbfs:/datasets/t-drive-trips-magellan/part-00040-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-793-1-c000.snappy.parquet part-00040-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-793-1-c000.snappy.parquet 6129475.0
dbfs:/datasets/t-drive-trips-magellan/part-00041-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-794-1-c000.snappy.parquet part-00041-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-794-1-c000.snappy.parquet 6096233.0
dbfs:/datasets/t-drive-trips-magellan/part-00042-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-795-1-c000.snappy.parquet part-00042-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-795-1-c000.snappy.parquet 6102240.0
dbfs:/datasets/t-drive-trips-magellan/part-00043-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-796-1-c000.snappy.parquet part-00043-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-796-1-c000.snappy.parquet 6092224.0
dbfs:/datasets/t-drive-trips-magellan/part-00044-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-797-1-c000.snappy.parquet part-00044-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-797-1-c000.snappy.parquet 6150214.0
dbfs:/datasets/t-drive-trips-magellan/part-00045-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-798-1-c000.snappy.parquet part-00045-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-798-1-c000.snappy.parquet 6154492.0
dbfs:/datasets/t-drive-trips-magellan/part-00046-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-799-1-c000.snappy.parquet part-00046-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-799-1-c000.snappy.parquet 6075132.0
dbfs:/datasets/t-drive-trips-magellan/part-00047-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-800-1-c000.snappy.parquet part-00047-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-800-1-c000.snappy.parquet 6159253.0
dbfs:/datasets/t-drive-trips-magellan/part-00048-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-801-1-c000.snappy.parquet part-00048-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-801-1-c000.snappy.parquet 6147865.0
dbfs:/datasets/t-drive-trips-magellan/part-00049-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-802-1-c000.snappy.parquet part-00049-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-802-1-c000.snappy.parquet 6109401.0
dbfs:/datasets/t-drive-trips-magellan/part-00050-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-803-1-c000.snappy.parquet part-00050-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-803-1-c000.snappy.parquet 6098660.0
dbfs:/datasets/t-drive-trips-magellan/part-00051-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-804-1-c000.snappy.parquet part-00051-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-804-1-c000.snappy.parquet 6065365.0
dbfs:/datasets/t-drive-trips-magellan/part-00052-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-805-1-c000.snappy.parquet part-00052-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-805-1-c000.snappy.parquet 6166406.0
dbfs:/datasets/t-drive-trips-magellan/part-00053-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-806-1-c000.snappy.parquet part-00053-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-806-1-c000.snappy.parquet 6123940.0
dbfs:/datasets/t-drive-trips-magellan/part-00054-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-807-1-c000.snappy.parquet part-00054-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-807-1-c000.snappy.parquet 6182341.0
dbfs:/datasets/t-drive-trips-magellan/part-00055-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-808-1-c000.snappy.parquet part-00055-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-808-1-c000.snappy.parquet 6107282.0
dbfs:/datasets/t-drive-trips-magellan/part-00056-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-809-1-c000.snappy.parquet part-00056-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-809-1-c000.snappy.parquet 6114309.0
dbfs:/datasets/t-drive-trips-magellan/part-00057-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-810-1-c000.snappy.parquet part-00057-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-810-1-c000.snappy.parquet 6151453.0
dbfs:/datasets/t-drive-trips-magellan/part-00058-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-811-1-c000.snappy.parquet part-00058-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-811-1-c000.snappy.parquet 6203356.0
dbfs:/datasets/t-drive-trips-magellan/part-00059-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-812-1-c000.snappy.parquet part-00059-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-812-1-c000.snappy.parquet 6113065.0
dbfs:/datasets/t-drive-trips-magellan/part-00060-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-813-1-c000.snappy.parquet part-00060-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-813-1-c000.snappy.parquet 6101909.0
dbfs:/datasets/t-drive-trips-magellan/part-00061-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-814-1-c000.snappy.parquet part-00061-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-814-1-c000.snappy.parquet 6134861.0
dbfs:/datasets/t-drive-trips-magellan/part-00062-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-815-1-c000.snappy.parquet part-00062-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-815-1-c000.snappy.parquet 6106498.0
dbfs:/datasets/t-drive-trips-magellan/part-00063-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-816-1-c000.snappy.parquet part-00063-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-816-1-c000.snappy.parquet 6147787.0
dbfs:/datasets/t-drive-trips-magellan/part-00064-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-817-1-c000.snappy.parquet part-00064-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-817-1-c000.snappy.parquet 6100162.0
dbfs:/datasets/t-drive-trips-magellan/part-00065-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-818-1-c000.snappy.parquet part-00065-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-818-1-c000.snappy.parquet 6108290.0
dbfs:/datasets/t-drive-trips-magellan/part-00066-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-819-1-c000.snappy.parquet part-00066-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-819-1-c000.snappy.parquet 6089103.0
dbfs:/datasets/t-drive-trips-magellan/part-00067-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-820-1-c000.snappy.parquet part-00067-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-820-1-c000.snappy.parquet 6166248.0
dbfs:/datasets/t-drive-trips-magellan/part-00068-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-821-1-c000.snappy.parquet part-00068-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-821-1-c000.snappy.parquet 6115293.0
dbfs:/datasets/t-drive-trips-magellan/part-00069-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-822-1-c000.snappy.parquet part-00069-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-822-1-c000.snappy.parquet 6076361.0
dbfs:/datasets/t-drive-trips-magellan/part-00070-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-746-1-c000.snappy.parquet part-00070-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-746-1-c000.snappy.parquet 6140281.0
dbfs:/datasets/t-drive-trips-magellan/part-00071-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-748-1-c000.snappy.parquet part-00071-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-748-1-c000.snappy.parquet 6082413.0
dbfs:/datasets/t-drive-trips-magellan/part-00072-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-823-1-c000.snappy.parquet part-00072-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-823-1-c000.snappy.parquet 6018901.0
dbfs:/datasets/t-drive-trips-magellan/part-00073-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-824-1-c000.snappy.parquet part-00073-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-824-1-c000.snappy.parquet 6170865.0
dbfs:/datasets/t-drive-trips-magellan/part-00074-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-825-1-c000.snappy.parquet part-00074-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-825-1-c000.snappy.parquet 6158579.0
dbfs:/datasets/t-drive-trips-magellan/part-00075-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-826-1-c000.snappy.parquet part-00075-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-826-1-c000.snappy.parquet 6180276.0
dbfs:/datasets/t-drive-trips-magellan/part-00076-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-827-1-c000.snappy.parquet part-00076-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-827-1-c000.snappy.parquet 6090038.0
dbfs:/datasets/t-drive-trips-magellan/part-00077-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-828-1-c000.snappy.parquet part-00077-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-828-1-c000.snappy.parquet 6014542.0
dbfs:/datasets/t-drive-trips-magellan/part-00078-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-829-1-c000.snappy.parquet part-00078-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-829-1-c000.snappy.parquet 6053589.0
dbfs:/datasets/t-drive-trips-magellan/part-00079-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-830-1-c000.snappy.parquet part-00079-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-830-1-c000.snappy.parquet 6151854.0
dbfs:/datasets/t-drive-trips-magellan/part-00080-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-831-1-c000.snappy.parquet part-00080-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-831-1-c000.snappy.parquet 6108894.0
dbfs:/datasets/t-drive-trips-magellan/part-00081-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-832-1-c000.snappy.parquet part-00081-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-832-1-c000.snappy.parquet 6182298.0
dbfs:/datasets/t-drive-trips-magellan/part-00082-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-833-1-c000.snappy.parquet part-00082-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-833-1-c000.snappy.parquet 6143713.0
dbfs:/datasets/t-drive-trips-magellan/part-00083-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-834-1-c000.snappy.parquet part-00083-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-834-1-c000.snappy.parquet 6100835.0
dbfs:/datasets/t-drive-trips-magellan/part-00084-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-835-1-c000.snappy.parquet part-00084-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-835-1-c000.snappy.parquet 6112446.0
dbfs:/datasets/t-drive-trips-magellan/part-00085-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-836-1-c000.snappy.parquet part-00085-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-836-1-c000.snappy.parquet 6113911.0
dbfs:/datasets/t-drive-trips-magellan/part-00086-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-764-1-c000.snappy.parquet part-00086-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-764-1-c000.snappy.parquet 6046043.0
dbfs:/datasets/t-drive-trips-magellan/part-00087-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-765-1-c000.snappy.parquet part-00087-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-765-1-c000.snappy.parquet 6129799.0
dbfs:/datasets/t-drive-trips-magellan/part-00088-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-837-1-c000.snappy.parquet part-00088-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-837-1-c000.snappy.parquet 6143620.0
dbfs:/datasets/t-drive-trips-magellan/part-00089-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-838-1-c000.snappy.parquet part-00089-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-838-1-c000.snappy.parquet 6122388.0
dbfs:/datasets/t-drive-trips-magellan/part-00090-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-769-1-c000.snappy.parquet part-00090-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-769-1-c000.snappy.parquet 6110756.0
dbfs:/datasets/t-drive-trips-magellan/part-00091-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-771-1-c000.snappy.parquet part-00091-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-771-1-c000.snappy.parquet 6114874.0
dbfs:/datasets/t-drive-trips-magellan/part-00092-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-839-1-c000.snappy.parquet part-00092-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-839-1-c000.snappy.parquet 6095812.0
dbfs:/datasets/t-drive-trips-magellan/part-00093-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-840-1-c000.snappy.parquet part-00093-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-840-1-c000.snappy.parquet 6061281.0
dbfs:/datasets/t-drive-trips-magellan/part-00094-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-774-1-c000.snappy.parquet part-00094-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-774-1-c000.snappy.parquet 6202331.0
dbfs:/datasets/t-drive-trips-magellan/part-00095-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-775-1-c000.snappy.parquet part-00095-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-775-1-c000.snappy.parquet 6130937.0

ScaDaMaLe Course site and book

This is part of Project MEP: Meme Evolution Programme and supported by databricks academic partners program.

Map-matching Noisy Spatial Trajectories of Vehicles to Roadways in Open Street Map

Dillon George, Dan Lilja and Raazesh Sainudiin

Copyright 2016-2019 Dillon George, Dan Lilja and Raazesh Sainudiin

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

This is the precursor 2016 presentation by Dillon George as part of Scalable Data Science from Middle Earth student project.

sds/uji/studentProjects/01DillonGeorge/038UberMapMatchingAndVisualization

Here we are updating it to more recent versions of the needed libraries.

What is map-matching?

Map matching is the problem of how to match recorded geographic coordinates to a logical model of the real world, typically using some form of Geographic Information System. See https://en.wikipedia.org/wiki/Map_matching.

//This allows easy embedding of publicly available information into any other notebook
//when viewing in git-book just ignore this block - you may have to manually chase the URL in frameIt("URL").
//Example usage:
// displayHTML(frameIt("https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation#Topics_in_LDA",250))
def frameIt( u:String, h:Int ) : String = {
      """<iframe 
 src=""""+ u+""""
 width="95%" height="""" + h + """"
 sandbox>
  <p>
    <a href="http://spark.apache.org/docs/latest/index.html">
      Fallback link for browsers that, unlikely, don't support frames
    </a>
  </p>
</iframe>"""
   }
displayHTML(frameIt("https://en.wikipedia.org/wiki/Map_matching",600))

Why are we interested in map-matching?

Mainly because we can naturally deal with noise in raw GPS trajectories of entities moving along mapped ways, such as, vehicles, pedestrians or cyclists.

  • Trajectories from sources like Uber are typically noisy and we will map-match such trajectories in this worksheet.
  • Often, such trajectories lead to significant graph-dimensionality reduction as you will see below.
  • More importantly, map-matching is a natural first step towards learning distributions over historical trajectories of an entity.
  • Moreover, a set of map-matched trajectories (with additional work using kNN operations) can be turned into a graphX graph that can be vertex-programmed and joined with other graphX representations of the map itself.

How are we map-matching?

We are using graphHopper for this for now. See https://en.wikipedia.org/wiki/GraphHopper.

The basic steps are the following:

  1. Preliminaries: 0. Attach needed libraries, load osm data and initialize graphhopper
  • the two steps 0.1 and 0.2 need to be done only once per cluster
  1. Setting up leaflet for visualisation
  2. Load table of Uber Data from earlier analysis. Then convert to an RDD for mapmatching
  3. Start Map Matching
  4. Display Results of a map-matched trajectory
  1. Preliminaries

Loading required libraries

  1. Launch a cluster using spark 2.4.3 (this is for compatibility with magellan built from the forked repos; see first notebook in this folder!).
  2. Attach following libraries if you have not already done so:
  • map_matching - com.graphhopper:map-matching:0.6.0 (more recent libraries may work but are note tested yet!)
  • magellan - import custom-built jar by downloading locally from https://github.com/lamastex/scalable-data-science/blob/master/custom-builds/jars/magellan/forks/ and then uploading to databricks
  • If needed only (this is already in databricks): spray-json io.spray:spray-json_2.11:1.3.4
import com.graphhopper.matching._
import com.graphhopper._
import com.graphhopper.routing.util.{EncodingManager, CarFlagEncoder}
import com.graphhopper.storage.index.LocationIndexTree
import com.graphhopper.util.GPXEntry

import magellan.Point

import scala.collection.JavaConverters._
import spray.json._
import DefaultJsonProtocol._

import scala.util.{Try, Success, Failure}

import org.apache.spark.sql.functions._
import com.graphhopper.matching._
import com.graphhopper._
import com.graphhopper.routing.util.{EncodingManager, CarFlagEncoder}
import com.graphhopper.storage.index.LocationIndexTree
import com.graphhopper.util.GPXEntry
import magellan.Point
import scala.collection.JavaConverters._
import spray.json._
import DefaultJsonProtocol._
import scala.util.{Try, Success, Failure}
import org.apache.spark.sql.functions._

Do Step 0 at the bottom of the notebook

Only once in shard per OSM file (ignore this step the second time!):

  • follow section below on **Step 0.1: Loading our OSM Data **
  • follow section below on Step 0.2: Initialising GraphHopper

NOTE

If you loaded a smaller map so as to be able to analyze in the community edition, then you need the bounding box of this map to filter those trajectories that fall within this smaller map.

For example SanfranciscoSmall OSM map has the following bounding box:

  • -122.449,37.747 and -122.397,37.772

Let's put them in Scala vals as follows:

val minLatInOSMMap = -122.449
val minLonInOSMMap = 37.747 
val maxLatInOSMMap = -122.397 
val maxLonInOSMMap = 37.772
minLatInOSMMap: Double = -122.449
minLonInOSMMap: Double = 37.747
maxLatInOSMMap: Double = -122.397
maxLonInOSMMap: Double = 37.772
  1. Setting up leaflet and visualisation

2.1 Setting up leaflet

You need to go to the following URL and set-up access-token in map-box to use leaflet independently:

  • https://leafletjs.com/examples/quick-start/
  • Request access-token:
    • https://account.mapbox.com/auth/signin/?route-to=%22/access-tokens/%22

2.2 Visualising with leaflet

Take an array of Strings in 'GeoJson' format, then insert this into a prebuild html string that contains all the code neccesary to display these features using Leaflet. The resulting html can be displayed in databricks using the displayHTML function.

See http://leafletjs.com/examples/geojson.html for a detailed example of using GeoJson with Leaflet.

def genLeafletHTML(features: Array[String]): String = {

  val featureArray = features.reduce(_ + "," +  _)
  // get your own access-token from https://leafletjs.com/examples/quick-start/
  // see request-access token link above at: https://account.mapbox.com/auth/signin/?route-to=%22/access-tokens/%22
  val accessToken = "pk.eyJ1Ijoic3RhdnJvdWxhdmxhY2hvdSIsImEiOiJjbDEzY3EwNDcycjBzM2JrYnBuemx4bmZkIn0.2DhL_f07vB0i7psep_QR8Q"

  val generatedHTML = f"""<!DOCTYPE html>
  <html>
  <head>
  <title>Maps</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.css">
  <style>
  #map {width: 600px; height:400px;}
  </style>

  </head>
  <body>
  <div id="map" style="width: 1000px; height: 600px"></div>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.js"></script>
  <script type="text/javascript">
  var map = L.map('map').setView([37.77471008393265, -122.40422604391485], 14);

  L.tileLayer('https://api.tiles.mapbox.com/v4/{id}/{z}/{x}/{y}.png?access_token=$accessToken', {
  maxZoom: 18,
  attribution: 'Map data &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, ' +
  '<a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>, ' +
  'Imagery © <a href="http://mapbox.com">Mapbox</a>',
  id: 'mapbox.streets'
  }).addTo(map);

  var features = [$featureArray];

 colors = features.map(function (_) {return rainbow(100, Math.floor(Math.random() * 100)); });

  for (var i = 0; i < features.length; i++) {
      console.log(i);
      L.geoJson(features[i], {
          pointToLayer: function (feature, latlng) {
              return L.circleMarker(latlng, {
                  radius: 4,
                  fillColor: colors[i],
                  color: colors[i],
                  weight: 1,
                  opacity: 1,
                  fillOpacity: 0.8
              });
          }
      }).addTo(map);
  }


  function rainbow(numOfSteps, step) {
  // This function generates vibrant, "evenly spaced" colours (i.e. no clustering). This is ideal for creating easily distinguishable vibrant markers in Google Maps and other apps.
  // Adam Cole, 2011-Sept-14
  // HSV to RBG adapted from: http://mjijackson.com/2008/02/rgb-to-hsl-and-rgb-to-hsv-color-model-conversion-algorithms-in-javascript
  var r, g, b;
  var h = step / numOfSteps;
  var i = ~~(h * 6);
  var f = h * 6 - i;
  var q = 1 - f;
  switch(i %% 6){
  case 0: r = 1; g = f; b = 0; break;
  case 1: r = q; g = 1; b = 0; break;
  case 2: r = 0; g = 1; b = f; break;
  case 3: r = 0; g = q; b = 1; break;
  case 4: r = f; g = 0; b = 1; break;
  case 5: r = 1; g = 0; b = q; break;
  }
  var c = "#" + ("00" + (~ ~(r * 255)).toString(16)).slice(-2) + ("00" + (~ ~(g * 255)).toString(16)).slice(-2) + ("00" + (~ ~(b * 255)).toString(16)).slice(-2);
  return (c);
  }
  </script>


  </body>
  """
  generatedHTML
}
genLeafletHTML: (features: Array[String])String
  1. Load Uber Data as in earlier analysis.

Then convert to an RDD for mapmatching

case class UberRecord(tripId: Int, time: String, latlon: Array[Double])

val uberData = sc.textFile("dbfs:/datasets/magellan/all.tsv").map { line =>
  val parts = line.split("\t" )
  val tripId = parts(0).toInt
  val time  = parts(1)
  val latlon = Array(parts(3).toDouble, parts(2).toDouble)
  UberRecord(tripId, time, latlon)
}.
repartition(100).
toDF().
select($"tripId", to_utc_timestamp($"time", "yyyy-MM-dd'T'HH:mm:ss").as("timeStamp"), $"latlon").
cache()
defined class UberRecord
uberData: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: int, timeStamp: timestamp ... 1 more field]
uberData.count()
res1: Long = 1128663
uberData.show(5,false)
+------+-------------------+------------------------+
|tripId|timeStamp          |latlon                  |
+------+-------------------+------------------------+
|2     |2007-01-06 06:23:27|[-122.436298, 37.800702]|
|6     |2007-01-04 01:04:58|[-122.429251, 37.79932] |
|8     |2007-01-03 00:59:01|[-122.444698, 37.759913]|
|11    |2007-01-06 09:08:04|[-122.422785, 37.801069]|
|14    |2007-01-02 05:18:37|[-122.422255, 37.764986]|
+------+-------------------+------------------------+
only showing top 5 rows
val uberOSMMapBoundingBoxFiltered = uberData
                                      .filter($"latlon"(0) >= minLatInOSMMap &&
                                              $"latlon"(0) <= maxLatInOSMMap &&
                                              $"latlon"(1) >= minLonInOSMMap &&
                                              $"latlon"(1) <= maxLonInOSMMap)
                                      .cache()
uberOSMMapBoundingBoxFiltered.count()
uberOSMMapBoundingBoxFiltered: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: int, timeStamp: timestamp ... 1 more field]
res1: Long = 253696
uberOSMMapBoundingBoxFiltered.show(5,false)
+------+-------------------+------------------------+
|tripId|timeStamp          |latlon                  |
+------+-------------------+------------------------+
|8     |2007-01-03 00:59:01|[-122.444698, 37.759913]|
|14    |2007-01-02 05:18:37|[-122.422255, 37.764986]|
|26    |2007-01-07 07:17:52|[-122.434058, 37.763653]|
|38    |2007-01-07 16:05:22|[-122.433124, 37.763497]|
|87    |2007-01-06 00:40:58|[-122.408277, 37.769129]|
+------+-------------------+------------------------+
only showing top 5 rows

The number of trajectory points that are not within our bounding box of the OSM is:

uberData.count() - uberOSMMapBoundingBoxFiltered.count()
res6: Long = 874967

We will consider a trip to be invalid when it contains less that two data points, as this is required by Graph Hopper. First identify the all trips that are valid.

val uberCountsFiltered = uberOSMMapBoundingBoxFiltered
                          .groupBy($"tripId".alias("validTripId"))
                          .count.filter($"count" > 1)
                          .drop("count")
uberCountsFiltered: org.apache.spark.sql.DataFrame = [validTripId: int]
uberCountsFiltered.show(5, false)
+-----------+
|validTripId|
+-----------+
|833        |
|1829       |
|3175       |
|5300       |
|5518       |
+-----------+
only showing top 5 rows

Next is to join this list of valid Ids with the original data set, only the entries for those trips contained in uberCountsFiltered.

val uberValidData = uberOSMMapBoundingBoxFiltered
  .join(uberCountsFiltered, uberOSMMapBoundingBoxFiltered("tripId") === uberCountsFiltered("validTripId")) // Only want trips with more than 2 data points
  .drop("validTripId").cache 
uberValidData: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: int, timeStamp: timestamp ... 1 more field]

Now seeing how many data points were dropped:

uberOSMMapBoundingBoxFiltered.count - uberValidData.count
res10: Long = 221
uberValidData.show(5,false)
+------+-------------------+------------------------+
|tripId|timeStamp          |latlon                  |
+------+-------------------+------------------------+
|8     |2007-01-03 00:59:01|[-122.444698, 37.759913]|
|14    |2007-01-02 05:18:37|[-122.422255, 37.764986]|
|26    |2007-01-07 07:17:52|[-122.434058, 37.763653]|
|38    |2007-01-07 16:05:22|[-122.433124, 37.763497]|
|87    |2007-01-06 00:40:58|[-122.408277, 37.769129]|
+------+-------------------+------------------------+
only showing top 5 rows

Graphopper considers a trip to be a sequence of (latitude, longitude, time) tuples. First the relevant columns are selected from the DataFrame, and then the rows are mapped to key-value pairs with the tripId as the key. After this is done the reduceByKey step merges all the (lat, lon, time) arrays for each key (trip Id) so that there is one entry for each trip id containing all the relevant data points.

// To use sql api instead of rdd api
// val ubers = uberValidData.
//   select($"tripId", struct($"latlon"(0), $"latLon"(1), $"timeStamp").as("coord"))
//   .groupBy($"tripId")
//   .agg(collect_set("coord").as("coords"))
val ubers = uberValidData.select($"tripId", $"latlon", $"timeStamp")
  .map( row => {
        val id = row.get(0).asInstanceOf[Integer]
        val time = row.get(2).asInstanceOf[java.sql.Timestamp].getTime
        // Array(lat, lon)
        val latlon = row.get(1).asInstanceOf[scala.collection.mutable.WrappedArray[Double]] 
        val entry = Array((latlon(0), latlon(1), time))
        (id, entry)
        }
      )
.rdd.reduceByKey( (e1, e2) => e1 ++ e2) // Sequence of timespace tuples
.cache
ubers: org.apache.spark.rdd.RDD[(Integer, Array[(Double, Double, Long)])] = ShuffledRDD[1195] at reduceByKey at command-2971213210274838:11
ubers.count
res12: Long = 8321
ubers.take(1) // first of 8,321 trip ids prepped and ready for map-matching
res13: Array[(Integer, Array[(Double, Double, Long)])] = Array((2100,Array((-122.430268,37.766517,1168142813000), (-122.430456,37.766368,1168142819000), (-122.430588,37.766267,1168142825000), (-122.430874,37.766065,1168142873000), (-122.431452,37.765596,1168142879000), (-122.43189,37.76524,1168142885000), (-122.432244,37.764965,1168142891000), (-122.432537,37.764759,1168142897000), (-122.432833,37.764534,1168142903000), (-122.433421,37.764042,1168142909000), (-122.434094,37.763526,1168142915000), (-122.406513,37.771497,1168142387000), (-122.40595,37.770276,1168142393000), (-122.406148,37.769156,1168142399000), (-122.407442,37.768924,1168142405000), (-122.409003,37.76907,1168142411000), (-122.410424,37.76931,1168142417000), (-122.412292,37.769523,1168142423000), (-122.414228,37.769585,1168142429000), (-122.416105,37.769652,1168142435000), (-122.4181,37.76988,1168142441000), (-122.419548,37.770068,1168142447000), (-122.420887,37.770144,1168142453000), (-122.422174,37.770579,1168142459000), (-122.422506,37.770788,1168142465000), (-122.422915,37.771149,1168142471000), (-122.422932,37.771242,1168142477000), (-122.423008,37.771513,1168142483000), (-122.423167,37.771772,1168142489000), (-122.423186,37.771882,1168142507000), (-122.423467,37.771914,1168142639000), (-122.42389,37.771557,1168142645000), (-122.424358,37.771214,1168142651000), (-122.42451,37.771112,1168142657000), (-122.424577,37.77107,1168142699000), (-122.424858,37.770832,1168142705000), (-122.425321,37.770462,1168142711000), (-122.425981,37.769912,1168142717000), (-122.426489,37.769485,1168142723000), (-122.42671,37.769349,1168142729000), (-122.426785,37.769304,1168142735000), (-122.427076,37.769072,1168142741000), (-122.427508,37.768713,1168142747000), (-122.42785,37.768438,1168142753000), (-122.427882,37.768396,1168142759000), (-122.427958,37.768333,1168142765000), (-122.428191,37.76816,1168142783000), (-122.428687,37.767765,1168142789000), (-122.429273,37.767284,1168142795000), (-122.429752,37.766923,1168142801000), (-122.430132,37.766626,1168142807000))))
  1. Start Map Matching

Now stepping into GraphHopper land we first define some utility functions for interfacing with the GraphHopper map matching library. Attaching the following artefact: - com.graphhopper:map-matching:0.6.0

This function takes a MatchResult from graphhopper and converts it into an Array of LON,LAT points.

def extractLatLong(mr: MatchResult): Array[(Double, Double)] = {
  val pointsList = mr.getEdgeMatches.asScala.zipWithIndex
                    .map{ case  (e, i) =>
                              if (i == 0) e.getEdgeState.fetchWayGeometry(3) // FetchWayGeometry returns vertices on graph if 2,
                              else e.getEdgeState.fetchWayGeometry(2) }      // and edges if 3 
                    .map{case pointList => pointList.asScala.toArray}
                    .flatMap{ case e => e}
  val latLongs = pointsList.map(point => (point.lon, point.lat)).toArray

  latLongs   
}
extractLatLong: (mr: com.graphhopper.matching.MatchResult)Array[(Double, Double)]

The following returns a new GraphHopper object and encoder. It reads the pre-generated graphhopper 'database' from the dbfs, this way multiple graphHopper objects can be created on the workers all reading from the same shared database.

Currently the documentation is scattered all over the place if it exists at all. The method to create the Graph as specified in the map-matching repository differs from the main GraphHopper repository. The API should hopefully converge as GraphHopper matures

See the main graphHopper documentation here, and the map-matching documentation here.

This function returns a new GrapHopper object, with all settings defined and reading the graph from the location in dbfs. Note: setAllowWrites(false) ensures that multiple GraphHopper objects can read from the same files simultaneously.

def getHopper = {
    val enc = new CarFlagEncoder() // Vehicle type
    val hopp = new GraphHopper()
    .setStoreOnFlush(true)
    .setCHWeightings("shortest")    // Contraction Hierarchy settings
    .setAllowWrites(false)         // Avoids issues when reading graph object fom HDFS
    .setGraphHopperLocation("/dbfs/files/graphhopper/graphHopperData")
    .setEncodingManager(new EncodingManager(enc))
  hopp.importOrLoad()
  
  (hopp, enc)
}
getHopper: (com.graphhopper.GraphHopper, com.graphhopper.routing.util.CarFlagEncoder)

The next step does the actual map matching. It begins by creating a new GraphHopper object for each partition, this is done as the GraphHopper objects themselves are not Serializable and so must be created on the partitions themselves to avoid this serialization step.

Then once all the GraphHopper and MapMatching objects are created and initialised map matching runs for each trajectory on that partition. The actual map matching is done in the mm.doWork() call, this returns a MatchResult object (it is wrapped in a Try statment as an exception is raised when no match is found). With this MatchResult, Failed matches are filtered out being replaced by dummy data, when successful the coordinates of the matched points are extracted into an array of (latitude, longitude)

The last (optional) step estimates the time taken to get from one matched point to another as currently there is no time information retained after the data has been map matched. This is a rather crude way of doing this and more sophisticated methods would be preferable.

Let's recall this most useful transformation first!

mapPartitions

Return a new RDD by applying a function to each partition of the RDD.

// let's look at a simple exmaple of mapPartitions in action
val x = sc.parallelize(Array(1,2,3), 2) // RDD with 2 partitions
x: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[96] at parallelize at command-2971213210274852:2
// our baby function we will call
def f(i:Iterator[Int])={ (i.sum, 42).productIterator }
f: (i: Iterator[Int])Iterator[Any]
val y = x.mapPartitions(f)
y: org.apache.spark.rdd.RDD[Any] = MapPartitionsRDD[97] at mapPartitions at command-2971213210274854:1
// glom() flattens elements on the same partition
val xOut = x.glom().collect()
xOut: Array[Array[Int]] = Array(Array(1), Array(2, 3))
val yOut = y.glom().collect() // we can see the mapPartitions with f applied to each partition
yOut: Array[Array[Any]] = Array(Array(1, 42), Array(5, 42))

Having understood the basic power of mapPartitions transformation, let's get back to map-matching problem at hand.

val matchTrips = ubers
  .mapPartitions(partition => {
    // Create the map matching object only once for each partition
    val (hopp, enc) = getHopper

    val tripGraph = hopp.getGraphHopperStorage()
    val locationIndex = new LocationIndexMatch(tripGraph,
                                               hopp.getLocationIndex().asInstanceOf[LocationIndexTree])

    val mm = new MapMatching(tripGraph, locationIndex, enc)
    
    def extractLatLong(mr: MatchResult): Array[(Double, Double)] = {
      val pointsList = mr.getEdgeMatches.asScala.zipWithIndex
                    .map{ case  (e, i) =>
                              if (i == 0) e.getEdgeState.fetchWayGeometry(3) // FetchWayGeometry returns vertices on graph if 2,
                              else e.getEdgeState.fetchWayGeometry(2) }      // and edges if 3 
                    .map{case pointList => pointList.asScala.toArray}
                    .flatMap{ case e => e}
      val latLongs = pointsList.map(point => (point.lon, point.lat)).toArray

      latLongs   
    }

    // Map matching parameters
    // Have not found any documention on what these do, other that comments in source code
    //   mm.setMaxSearchMultiplier(2000)
    mm.setSeparatedSearchDistance(600)
    mm.setForceRepair(true)

    // Do the map matching for each trajectory
    val matchedPartition = partition.map{case (key, dataPoints) => {

      val sortedPoints = dataPoints.sortWith( (a, b) => a._3 < b._3) // Sort by time
      val gpxEntries = sortedPoints.map{ case (lat, lon, time) => new GPXEntry(lon, lat, time)}.toList.asJava

      val mr = Try(mm.doWork(gpxEntries)) // mapMatch the trajectory, Try() wraps the exception when no match can be found
      val points = mr match {
        case Success(result) => {
          val pointsList = result.getEdgeMatches.asScala.zipWithIndex // (edge, index tuple)
                      .map{ case  (e, i) =>
                                if (i == 0) e.getEdgeState.fetchWayGeometry(3) // FetchWayGeometry returns verts on graph if 2,
                                else e.getEdgeState.fetchWayGeometry(2)        // and edges if 3 (I'm pretty sure that's the case)
                      }      
                      .map{case pointList => pointList.asScala.toArray}
                      .flatMap{ case e => e}
          val latLongs = pointsList.map(point => (point.lon, point.lat)).toArray

          latLongs
        }
        case Failure(_) => Array[(Double, Double)]() // When no match can be made
      }

      // Use GraphHopper routing to get time estimates of the new matched trajcetory
      /// NOTE: Currently only calculates time offsets from 0
      val times = points.iterator.sliding(2).map{ pair =>
        val (lonFrom, latFrom) = pair(0)
        val (lonTo, latTo) = pair(1)

        val req = new GHRequest(latFrom, lonFrom, latTo, lonTo)
            .setWeighting("shortest")
            .setVehicle("car")
            .setLocale("US")

//         val time = hopp.route(req).getTime -- using new method
        val time = hopp.route(req).getBest.getTime
        time
      }
    
    val timeOffsets = times.scanLeft(0.toLong){ (a: Long, b: Long) => a + b }.toList
    
    (key, points.zip(timeOffsets)) // Return a tuple of (key, Array((lat, lon), timeOffSetFromStart))
  }}
  
  matchedPartition
}).cache
matchTrips: org.apache.spark.rdd.RDD[(Integer, Array[((Double, Double), Long)])] = MapPartitionsRDD[1196] at mapPartitions at command-2971213210274858:2
display(matchTrips.toDF.limit(2))
// Define the schema of the points in a map matched trip
case class UberMatched(id: Int, lat: Double, lon: Double, time: Long) 
defined class UberMatched

Here we convert the map matched points into a dataframe and explore certain things about the matched points

// Create a dataframe to better explore the matched trajectories, make sure it is sensible
val matchTripsDF = matchTrips.map{case (id, points) => 
  points.map(point => UberMatched(id, point._1._1, point._1._2, point._2 ))
}
.flatMap(uberMatched => uberMatched)
.toDF.cache
matchTripsDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: int, lat: double ... 2 more fields]
matchTripsDF.groupBy($"id").count.orderBy(-$"count").show(10)
+-----+-----+
|   id|count|
+-----+-----+
|11721|  418|
|23602|  264|
| 3586|  250|
| 3719|  247|
| 5783|  225|
|10858|  217|
| 7092|  212|
|10842|  212|
|12734|  212|
| 1333|  208|
+-----+-----+
only showing top 10 rows

Finally it is helpful to be able to visualise the results of the map matching.

These next few steps take the map matched trips and convert them into json using the Spray-Json library. See here for documentation on the library.

To make the visualisation less clutterd only two trips will be selected. Though little would have to be done to extend this to multiple/all of the trajectories.

Here we select only those points that belong to the trip with id 11721 and id 10858, it is selected only because it contains the most points after map matching.

val filterTrips = matchTrips.filter{case (id, values) => id == 11721 || id == 10858 }.cache
filterTrips: org.apache.spark.rdd.RDD[(Integer, Array[((Double, Double), Long)])] = MapPartitionsRDD[115] at filter at command-2971213210274866:1

Next a schema for the json representation of a trajectory. Then the filtered trips are collected to the master and converted to strings of Json

// Convert our Uber data points into GeoJson Geometries
// Is not fully compliant with the spec but in a format that Leaflet understands
case class UberData(`type`: String = "MultiPoint",
                    coordinates: Array[(Double, Double)])

object UberJsonProtocol extends DefaultJsonProtocol {
  implicit val uberDataFormat = jsonFormat2(UberData)
}

import UberJsonProtocol._

val mapMatchedTrajectories = filterTrips.collect.map{case (key, matchedPoints) => { // change filterTrips to matchTrip to get all matched trajectories as json
  val jsonString = UberData(coordinates = matchedPoints.map{case ((lat, lon), time) => (lat, lon)}).toJson.prettyPrint
  jsonString
}}
defined class UberData
defined object UberJsonProtocol
import UberJsonProtocol._
mapMatchedTrajectories: Array[String] =
Array({
  "type": "MultiPoint",
  "coordinates": [[-122.42245184084292, 37.77072625839843], [-122.42243656715236, 37.77058581495104], [-122.42240359833248, 37.77025929324908], [-122.4224008043647, 37.77022911839699], [-122.4223676492803, 37.76987353943006], [-122.42172391910233, 37.76991358630166], [-122.42160135704879, 37.76992345832117], [-122.4214882944857, 37.76993705563107], [-122.42136852639992, 37.76995251558615], [-122.42125322866261, 37.769969651921905], [-122.42113383310587, 37.76998883716737], [-122.42100940840713, 37.76997337721229], [-122.42095278399331, 37.769978406354305], [-122.42077732281635, 37.76999368004487], [-122.42028837845373, 37.77000932626447], [-122.42018705055536, 37.7700085812064], [-122.42019748136842, 37.7699178703856], [-122.4201181326833, 37.769236142245745], [-122.42008423254082, 37.76918175300617], [-122.4200527538371, 37.76923651477478], [-122.4200313334174, 37.76943265131338], [-122.42002574548182, 37.76953025392138], [-122.42001345202357, 37.769660266555704], [-122.41998607113926, 37.7699178703856], [-122.41996651336476, 37.770005973503125], [-122.41984357878216, 37.77000057183207], [-122.41926019830838, 37.769963877721814], [-122.41909088386053, 37.76994469247635], [-122.41891113859961, 37.7699178703856], [-122.41878541004922, 37.76989794008206], [-122.41829665195115, 37.76981821886789], [-122.41782111863392, 37.769741664150544], [-122.41772891769698, 37.76972694925354], [-122.41753948668106, 37.76969547054981], [-122.4173642117686, 37.76966864845906], [-122.41688439436743, 37.76960867128392], [-122.41674693115235, 37.76959824047085], [-122.41665435768637, 37.7695909761546], [-122.41619428432422, 37.769570859586544], [-122.41606762445124, 37.76956285021222], [-122.4155909735469, 37.76953304788917], [-122.41450244369736, 37.76950417688871], [-122.41378495276983, 37.76948741308199], [-122.41340273797667, 37.769474560830176], [-122.41220785108673, 37.76943842551347], [-122.41199159798008, 37.76943209251982], [-122.41131154622089, 37.76938534012553], [-122.41121822769682, 37.769376399428616], [-122.41115731919909, 37.769370625228525], [-122.41101464057746, 37.769354420215365], [-122.4109047445112, 37.76943488648761], [-122.41077063405746, 37.76954627267002], [-122.4104606898977, 37.76979866109338], [-122.41031279586954, 37.76991340003714], [-122.41022636913269, 37.76998045526401], [-122.40974413029278, 37.77035447441834], [-122.40962901881998, 37.770334357850274], [-122.40929392895015, 37.770282017520415], [-122.40906240215293, 37.77022297166786], [-122.40884540398818, 37.77013673119553], [-122.40854067523496, 37.76996927939287], [-122.40832609850897, 37.7698349826746], [-122.40775855051932, 37.769472139391425], [-122.40708874330868, 37.769007595680826], [-122.40683840379502, 37.768820027310106], [-122.4063531847228, 37.76846202690442], [-122.4059715287232, 37.76817015040301], [-122.40576030475856, 37.76796693581269], [-122.40560086233022, 37.767732987576714], [-122.40553455216143, 37.76757298635482], [-122.40549711299309, 37.767405162023124], [-122.405491338793, 37.76732115672502], [-122.40548221183155, 37.767276453240434], [-122.40541385275306, 37.76674075648354], [-122.40533245515822, 37.76631756349618], [-122.40525645923442, 37.76605064644033], [-122.40513762247124, 37.76481217365292], [-122.40511340808376, 37.764428096214566], [-122.40511191796762, 37.764387118020366], [-122.40511229049666, 37.764303298986775], [-122.40512979936145, 37.76398329654299], [-122.40516705226527, 37.76372625150665], [-122.40522740196946, 37.76346250094762], [-122.40535387557792, 37.763090530703], [-122.40559397054301, 37.76260307645656], [-122.4061734394619, 37.76165648017056], [-122.40629581525093, 37.76140763077306], [-122.40639081015566, 37.76115263464643], [-122.40645637526639, 37.76089186431971], [-122.40648915782174, 37.76062345714771], [-122.40649027540886, 37.760358775266084], [-122.4064604730858, 37.76009297579735], [-122.40635430230992, 37.759696232371695], [-122.40624906285665, 37.75945166705813], [-122.40603057457575, 37.75908621607169], [-122.40585082931483, 37.758859904680996], [-122.40563178224039, 37.758637691109726], [-122.405418136837, 37.75845440682294], [-122.4044393167892, 37.75773505325023], [-122.40410292306773, 37.757449137213435], [-122.40381235041795, 37.75713416391166], [-122.40364862390567, 37.75690058820472], [-122.40351767994875, 37.75665658168472], [-122.40338207937886, 37.75629262081443], [-122.40328447677086, 37.75577648183204], [-122.40304456807027, 37.753280164747245], [-122.40301904983116, 37.75275210483563], [-122.40302016741828, 37.75262414111102], [-122.40311255461974, 37.75222106469172], [-122.40330012299046, 37.75159651975922], [-122.40334724791379, 37.75146576206682], [-122.40350594528405, 37.75110049734489], [-122.40397365549148, 37.74993131495859], [-122.40405896464122, 37.749760324130065], [-122.40410981485493, 37.749678367741666], [-122.40420015314669, 37.74960479325663], [-122.40431377450334, 37.749536434178125], [-122.40456467281054, 37.749446282150885], [-122.40481277714996, 37.74940493142765], [-122.40491466384191, 37.7494112644213], [-122.40501748185643, 37.749430263402246], [-122.40516649347171, 37.7494883779322], [-122.40536486518454, 37.7495772261078], [-122.40538982463009, 37.74960702843086], [-122.40553064060653, 37.74984898604115], [-122.40555466872948, 37.74989443458381], [-122.4057696179845, 37.750304030261276], [-122.40605516149228, 37.75084438363115], [-122.40610843314472, 37.751026177801776], [-122.40614456846143, 37.75138157050419], [-122.4061715768167, 37.75163172375333], [-122.40617921366199, 37.751738453322766], [-122.406184242804, 37.75181538056915], [-122.40625129803087, 37.75252877367725], [-122.4062878058766, 37.752933153948184], [-122.4062905998444, 37.75300225808476], [-122.40641278936891, 37.752984004161895], [-122.40651244088663, 37.75296928926489], [-122.4072226674979, 37.75292011543185], [-122.40732176022206, 37.75291359617368], [-122.40740557925565, 37.752908380767146], [-122.40813294220268, 37.75286479486968], [-122.40821378100397, 37.75285976572766], [-122.40829014945679, 37.75285548164373], [-122.40903148224275, 37.752812454539814], [-122.40914864262525, 37.75280556275261], [-122.40927343985304, 37.75279829843637], [-122.41001421384546, 37.7527547125389], [-122.41008909218212, 37.752750242190444], [-122.41016657822206, 37.752745585577465], [-122.41090493077573, 37.7527010683574], [-122.41098223055114, 37.75269622547991], [-122.41105971659108, 37.75269138260241], [-122.41179974552541, 37.7526483554985], [-122.41189567175275, 37.75264202250485], [-122.41203499761302, 37.752632895543414], [-122.41225963262303, 37.75261780811737], [-122.41248482642662, 37.752603093220365], [-122.41289721607187, 37.752578133774804], [-122.41293372391762, 37.7525760848651], [-122.41301288633822, 37.752571241987596], [-122.41308403938451, 37.75256677163914], [-122.41311160653333, 37.75256528152299], [-122.4135172906559, 37.75254088087099], [-122.41380134404751, 37.75252374453523], [-122.41398928494728, 37.752512382399566], [-122.41410234751037, 37.75250567687688], [-122.41423422278987, 37.752498598825156], [-122.41510854844246, 37.75244514090818], [-122.41518864218567, 37.752440298030685], [-122.41528102938715, 37.75243452383059], [-122.41615107095579, 37.752382556029765], [-122.41628164238367, 37.752374546655446], [-122.41641891933423, 37.75236709607468], [-122.41644611395402, 37.752364674635935], [-122.41684248485065, 37.75234083277749], [-122.41700099595639, 37.75233133328702], [-122.41727815756079, 37.7523145694803], [-122.41737389752359, 37.75230898154472], [-122.41747615674457, 37.75230264855108], [-122.41789525191251, 37.75227768910552], [-122.4180444497923, 37.75226856214408], [-122.4181517381553, 37.75226204288592], [-122.41833707135179, 37.75225049448573], [-122.41846335869573, 37.752243416434005], [-122.41859653782687, 37.75223577958872], [-122.41864999574386, 37.75223279935642], [-122.41899514389772, 37.752213427846435], [-122.41903332812413, 37.752211192672206], [-122.41945596231794, 37.75218586069761], [-122.41956958367459, 37.75217971396848], [-122.41967482312786, 37.75217393976839], [-122.42008460506986, 37.752145813826004], [-122.42021722540746, 37.75213799071621], [-122.42044968352727, 37.75212383461275], [-122.4206655641049, 37.75211098236094], [-122.42096768515485, 37.752092728438065], [-122.42114370512539, 37.752082297624995], [-122.42159576911321, 37.75205491674069], [-122.42274613878308, 37.75198581260411], [-122.42283442816513, 37.75198059719757], [-122.4229268153666, 37.75197482299748], [-122.42305887691063, 37.7519671861522], [-122.42399448359001, 37.75191167932552], [-122.4245283177017, 37.75187908303467], [-122.42509512063329, 37.75184443783412], [-122.42524282839692, 37.75183568340173], [-122.42542350498043, 37.7518246937951], [-122.4255550077309, 37.751816870685296], [-122.42580981759302, 37.75180159699473], [-122.4263317307755, 37.751770118291006], [-122.42730943323619, 37.7517114449675], [-122.4274325540833, 37.75170418065125], [-122.42744857283193, 37.751703249328656], [-122.4274705520452, 37.75170194547702], [-122.42845365617693, 37.751642527095434], [-122.42854771975907, 37.75163693915986], [-122.42866134111571, 37.75163004737266], [-122.42952374583908, 37.75157826583635], [-122.42965878761542, 37.75157007019751], [-122.42982530809549, 37.751560011913476], [-122.43069106558019, 37.751508044112654], [-122.43085479209248, 37.75149817209314], [-122.43102913568234, 37.75148774128007], [-122.43173936229361, 37.7514450867052], [-122.43187682550871, 37.75143670480184], [-122.4320485613953, 37.751426087724255], [-122.43314174785782, 37.75135884623287], [-122.43372531459612, 37.75132289718068], [-122.43381379024268, 37.75131730924511], [-122.4341086469764, 37.75129924158676], [-122.43445211874959, 37.75127875248966], [-122.4358818851981, 37.751193443339915], [-122.43617022267364, 37.75117630700416], [-122.43630042157248, 37.75116848389436], [-122.43645744256207, 37.75115898440389], [-122.4383871429798, 37.75104089269878], [-122.43849163737502, 37.75103455970513], [-122.43850057807194, 37.751126574377565], [-122.43856800582785, 37.751832516904905], [-122.43864437428067, 37.75263028784015], [-122.4387216740561, 37.753435881885196], [-122.43873191860465, 37.75354372904175], [-122.43879766997988, 37.7542293687365], [-122.43890384075576, 37.75533708383151], [-122.43888428298125, 37.75541363854886], [-122.43902025608018, 37.75539706100666], [-122.4396468499224, 37.75531994749576], [-122.43996908754042, 37.755253451062444], [-122.44005495548372, 37.75522681523621], [-122.44018049776957, 37.75523631472669], [-122.44025556237077, 37.75527729292089], [-122.44029169768747, 37.755338946476705], [-122.44012536347192, 37.75564386149445], [-122.44010282546512, 37.75574537565735], [-122.4400860616584, 37.75581652870364], [-122.44008382648417, 37.75598342171274], [-122.44015367567883, 37.75641872189385], [-122.44026208162893, 37.75640996746145], [-122.44090711565853, 37.75625946573003], [-122.44103526564766, 37.756229477142455], [-122.44106059762225, 37.75627492568511], [-122.44112187864904, 37.75635855845418], [-122.44121538343762, 37.756444240132964], [-122.44141412767948, 37.756549852115285], [-122.4415674233787, 37.756590830309484], [-122.44202265386335, 37.756654346510494], [-122.44210684542597, 37.75667818836894], [-122.44217688088514, 37.756704451666124], [-122.44224374984749, 37.75673611663437], [-122.4423139715712, 37.756780075060874], [-122.44248309975453, 37.75692554765028], [-122.44293516374235, 37.75754711235047], [-122.44309535122876, 37.757728533992065], [-122.44316426910082, 37.75779689307057], [-122.4432460392247, 37.75787363405243], [-122.44332892693569, 37.75794571842132], [-122.44340939320794, 37.75801053847396], [-122.44347458578962, 37.75805915351344], [-122.44363905735997, 37.75816737319903], [-122.44366681077332, 37.75818618591546], [-122.44371896483867, 37.75821431185784], [-122.44376702108458, 37.75823852624532], [-122.44390988597073, 37.758302787504405], [-122.44451822589006, 37.75858255681207], [-122.44463631759515, 37.75867103245864], [-122.44473671417094, 37.75878297743461], [-122.44477601598447, 37.75884183702264], [-122.44480879853984, 37.758906657075286], [-122.44486784439239, 37.75916761366653], [-122.4448445613275, 37.75937958268925], [-122.44482053320453, 37.7594542747614], [-122.44479408364283, 37.75950903653001], [-122.44468958924762, 37.7596699690745], [-122.44464693467275, 37.75971746652687], [-122.44469797115097, 37.75970126151371], [-122.44479464243638, 37.759634392551355], [-122.44495222221953, 37.75936002491474], [-122.44499599438151, 37.759316997810835], [-122.44500102352353, 37.75924044309349], [-122.44499841582027, 37.759163515847106], [-122.44498891632979, 37.75908938256851], [-122.44497196625855, 37.75901599434799], [-122.44487902026353, 37.758804956647865], [-122.44477582971996, 37.75868388471046], [-122.44463929782746, 37.75857193973449], [-122.44448469827663, 37.75848383661696], [-122.44389088698978, 37.75821021403842], [-122.44382643946618, 37.75816979463778], [-122.44374783583912, 37.75811615045628], [-122.44356213011359, 37.75798408891225], [-122.44349786885451, 37.75793603266633], [-122.44342261798879, 37.75787586922666], [-122.44334289677462, 37.75780657882556], [-122.44326038159267, 37.75773207301793], [-122.44312887884219, 37.7576069032611], [-122.44297241664616, 37.75744373554238], [-122.44256282096869, 37.75688196175282], [-122.44244510179263, 37.75677020304136], [-122.44230652099043, 37.75667911969153], [-122.44223164265375, 37.756641121729636], [-122.4421433532717, 37.7566085254388], [-122.44196379427531, 37.75656382195422], [-122.44160039219857, 37.756517814618], [-122.44146106633829, 37.756474973778616], [-122.44134465101386, 37.75641015372597], [-122.44121985378608, 37.756303424156535], [-122.44115093591401, 37.75620265505171], [-122.44111182036501, 37.75607413253354], [-122.44111535939086, 37.755970941989965], [-122.44114665183008, 37.75585154643323], [-122.44120681526974, 37.75573867013466], [-122.44167303536102, 37.7551027630665], [-122.44191704188101, 37.754524970528294], [-122.44245217984435, 37.75379239217473], [-122.44253003841334, 37.753632204688316], [-122.44257902598186, 37.75346847817604], [-122.44267327582851, 37.75309091999585], [-122.44279043621101, 37.75262209220131], [-122.44285097217971, 37.752379575797455], [-122.44288189208989, 37.75225645495034], [-122.4430519515958, 37.751575099339526], [-122.4430664802283, 37.75151754360313], [-122.44338983543344, 37.7502222601374], [-122.44341144211765, 37.75013564713603], [-122.44343230374379, 37.750052386895995], [-122.44350159414489, 37.74977522529159], [-122.4435749823654, 37.74950029886142], [-122.4436636442765, 37.749304348587344], [-122.44415277490361, 37.74863696281545], [-122.44424609342768, 37.74845721755454], [-122.44427161166679, 37.748329998888], [-122.44427012155063, 37.74820035878272], [-122.4441691661813, 37.74768254341966], [-122.44418928274935, 37.74752719881074], [-122.44425745556335, 37.747374834434126], [-122.44429396340908, 37.74732715071724], [-122.44433289269357, 37.74728486867141], [-122.44444018105656, 37.74719136388283], [-122.44456050793589, 37.74711704433971], [-122.44468809913147, 37.74706898809379], [-122.44479911278485, 37.74704551876438], [-122.44487902026353, 37.747034156628715], [-122.44489075492824, 37.747117789397784], [-122.44492279242552, 37.7473433557304], [-122.4450084741043, 37.747946293978686], [-122.44501257192371, 37.74798112544375], [-122.44501517962698, 37.748013721734594], [-122.44501946371092, 37.74809158030357], [-122.44503306102081, 37.74830839220379], [-122.44503995280802, 37.74837339852095], [-122.4450563440857, 37.748503224890754], [-122.44507087271819, 37.74861815009903], [-122.44505131494368, 37.74878560190169], [-122.44488144170228, 37.74909815376471], [-122.4448533157599, 37.7492112163278], [-122.44484679650174, 37.74925498848978], [-122.44483059148857, 37.749368423581906], [-122.44492614518686, 37.74959305859193], [-122.44498053442643, 37.749668495722155], [-122.44528135162476, 37.74993429519089], [-122.44562314701729, 37.750238278886044], [-122.44574682665795, 37.75043348410205], [-122.44575558109035, 37.7505033332967], [-122.44575166953545, 37.750522704806684], [-122.44574515027729, 37.75054095872956], [-122.44544843089838, 37.75082389453405], [-122.4453169281479, 37.75089355746419], [-122.44528936099908, 37.75090119430947], [-122.44528805714745, 37.75093751589069], [-122.44510607671229, 37.75134431760038], [-122.44509694975086, 37.75137747268477], [-122.44508670520231, 37.75141751955638], [-122.44506584357617, 37.75165947716667], [-122.44510309648, 37.75186101537632], [-122.4451695929133, 37.752015428662645], [-122.44545010727904, 37.75251424504476], [-122.44547413540201, 37.752598250342864], [-122.44545588147915, 37.752727145390075], [-122.44527408730852, 37.753161700513104], [-122.44525415700497, 37.753356719464584], [-122.44525061797911, 37.75338354155534], [-122.44524633389517, 37.75340552076859], [-122.44513979059025, 37.75363816515293], [-122.44493564467733, 37.75390079812484], [-122.44490062694774, 37.75394196258355], [-122.44486858945046, 37.75398405836487], [-122.44471957783519, 37.75424352483996], [-122.4446718941183, 37.75438974248744], [-122.44464805225986, 37.75454378324473], [-122.44466183583428, 37.75478704470665], [-122.4447605560294, 37.75506811786595], [-122.44474695871949, 37.75521061022305], [-122.44468083481522, 37.75530877162461], [-122.44456628213598, 37.75540674676165], [-122.44441969195947, 37.7554950361437], [-122.44432525584828, 37.75551329006657], [-122.44423622140816, 37.75551235874397]]
}, {
  "type": "MultiPoint",
  "coordinates": [[-122.41189697560438, 37.772072950871426], [-122.4115702676379, 37.77181236680923], [-122.41128695930436, 37.7715866142121], [-122.4109047445112, 37.771281140400795], [-122.41059219264818, 37.77103489870656], [-122.41056462549935, 37.77101273322879], [-122.41011107139538, 37.77064635091975], [-122.40974413029278, 37.77035447441834], [-122.40962901881998, 37.770334357850274], [-122.40929392895015, 37.770282017520415], [-122.40906240215293, 37.77022297166786], [-122.40884540398818, 37.77013673119553], [-122.40854067523496, 37.76996927939287], [-122.40832609850897, 37.7698349826746], [-122.40775855051932, 37.769472139391425], [-122.40708874330868, 37.769007595680826], [-122.40683840379502, 37.768820027310106], [-122.4063531847228, 37.76846202690442], [-122.4059715287232, 37.76817015040301], [-122.40576030475856, 37.76796693581269], [-122.40560086233022, 37.767732987576714], [-122.40553455216143, 37.76757298635482], [-122.40549711299309, 37.767405162023124], [-122.405491338793, 37.76732115672502], [-122.40548221183155, 37.767276453240434], [-122.40541385275306, 37.76674075648354], [-122.40533245515822, 37.76631756349618], [-122.40525645923442, 37.76605064644033], [-122.40513762247124, 37.76481217365292], [-122.40511340808376, 37.764428096214566], [-122.40511191796762, 37.764387118020366], [-122.40511229049666, 37.764303298986775], [-122.40512979936145, 37.76398329654299], [-122.40516705226527, 37.76372625150665], [-122.40522740196946, 37.76346250094762], [-122.40535387557792, 37.763090530703], [-122.40559397054301, 37.76260307645656], [-122.4061734394619, 37.76165648017056], [-122.40629581525093, 37.76140763077306], [-122.40639081015566, 37.76115263464643], [-122.40645637526639, 37.76089186431971], [-122.40648915782174, 37.76062345714771], [-122.40649027540886, 37.760358775266084], [-122.4064604730858, 37.76009297579735], [-122.40635430230992, 37.759696232371695], [-122.40624906285665, 37.75945166705813], [-122.40603057457575, 37.75908621607169], [-122.40585082931483, 37.758859904680996], [-122.40563178224039, 37.758637691109726], [-122.405418136837, 37.75845440682294], [-122.4044393167892, 37.75773505325023], [-122.40410292306773, 37.757449137213435], [-122.40381235041795, 37.75713416391166], [-122.40364862390567, 37.75690058820472], [-122.40351767994875, 37.75665658168472], [-122.40338207937886, 37.75629262081443], [-122.40328447677086, 37.75577648183204], [-122.40304456807027, 37.753280164747245], [-122.40301904983116, 37.75275210483563], [-122.40302016741828, 37.75262414111102], [-122.40311255461974, 37.75222106469172], [-122.40330012299046, 37.75159651975922], [-122.40334724791379, 37.75146576206682], [-122.40347316272869, 37.751091370383456], [-122.40387642541252, 37.749896297229], [-122.40389225789664, 37.74986388720268], [-122.4039377064393, 37.74981415457608], [-122.40402860352461, 37.74973908997489], [-122.40416439035903, 37.749664397902734], [-122.40463563959231, 37.749497691158155], [-122.40476937751701, 37.749389285208046], [-122.40483326624707, 37.74926839953516], [-122.40483922671167, 37.74913950448795], [-122.40478558253018, 37.74901731496343], [-122.40468276451564, 37.74891375189082], [-122.4044149161372, 37.74871090982953], [-122.40412471601645, 37.74848776493567], [-122.4040509552669, 37.74841344539255], [-122.40398799785945, 37.74834452752049], [-122.40392485418748, 37.74825120899643], [-122.4038879738127, 37.74815118494968], [-122.40387288638665, 37.74812250021374], [-122.40381253668247, 37.74796361657896], [-122.40379316517249, 37.74780044886024], [-122.40382296749554, 37.74764249654805], [-122.40391088434855, 37.747388059214984], [-122.40431284318073, 37.746326537720705], [-122.40433333227783, 37.746106559323664], [-122.40432364652284, 37.74600411383817], [-122.40428974638037, 37.745946185572734], [-122.40423945496022, 37.74590185461719], [-122.40413365671337, 37.74586124895203], [-122.40404741624104, 37.74586571930049], [-122.40396136203321, 37.74589906064941], [-122.4038605929284, 37.74601007430278], [-122.40383228072149, 37.746066326187545], [-122.4036722794996, 37.74691699624621], [-122.40364229091202, 37.74708984971992], [-122.40366985806085, 37.74729269178121], [-122.40359535225322, 37.747486779410096], [-122.40360392042109, 37.74773451122048], [-122.40361826278907, 37.74788389536479], [-122.40362031169877, 37.74791481527496], [-122.40367246576412, 37.747960636346654], [-122.40376876452048, 37.7482245731702], [-122.40384587803139, 37.74835011545606], [-122.4040069968404, 37.74854867343341], [-122.40419028112717, 37.74872767363625], [-122.40459410260455, 37.74900744294392], [-122.40468965630285, 37.749071704203004], [-122.40472877185185, 37.74909796750019], [-122.40478297482692, 37.749135220404014], [-122.40497072946215, 37.749265046773814], [-122.40505249958602, 37.74932129865858], [-122.40513762247124, 37.74937978571757], [-122.40533990573898, 37.7495498452235], [-122.40536486518454, 37.7495772261078], [-122.40538982463009, 37.74960702843086], [-122.40553064060653, 37.74984898604115], [-122.40555466872948, 37.74989443458381], [-122.4057696179845, 37.750304030261276], [-122.40605516149228, 37.75084438363115], [-122.40610843314472, 37.751026177801776], [-122.40614456846143, 37.75138157050419], [-122.4061715768167, 37.75163172375333], [-122.40617921366199, 37.751738453322766], [-122.406184242804, 37.75181538056915], [-122.40625129803087, 37.75252877367725], [-122.4062878058766, 37.752933153948184], [-122.4062905998444, 37.75300225808476], [-122.40629525645737, 37.753083841944125], [-122.40632133349004, 37.75332524076086], [-122.40634275390974, 37.75356067911299], [-122.40640701516882, 37.75421390878142], [-122.4064146520141, 37.754292698673], [-122.4064418466339, 37.754581036148544], [-122.4064487384211, 37.754652189194836], [-122.40658862307494, 37.75610896399861], [-122.40659495606859, 37.75617564669644], [-122.4066009165332, 37.75624679974273], [-122.40664506122421, 37.75677113436396], [-122.40665139421786, 37.75684582643611], [-122.40666014865026, 37.75691567563077], [-122.40669702902504, 37.75721500271294], [-122.40670969501234, 37.75739307159319], [-122.40672552749646, 37.75756834650565], [-122.40673819348376, 37.75770785863045], [-122.40674471274193, 37.75778012926385], [-122.40675123200009, 37.75784550811005], [-122.40676855460038, 37.758021714345105], [-122.40687267646653, 37.759083794632936], [-122.40689204797653, 37.759306566997765], [-122.40689893976374, 37.75937604366338], [-122.40690434143478, 37.75945632367111], [-122.40699635610721, 37.76036808849204], [-122.40701814905594, 37.7605999878183], [-122.40702392325603, 37.7606692782194], [-122.40703193263036, 37.76074397029156], [-122.407047951379, 37.760907696803834], [-122.40708557681185, 37.76129475447449], [-122.40712096707048, 37.76165256861566], [-122.40714238749018, 37.76187478218693], [-122.40714797542574, 37.761942582471875], [-122.40715598480007, 37.762023980066715], [-122.40723514722069, 37.76293220586178], [-122.40725526378874, 37.76314249850383], [-122.40727817432459, 37.763173418414], [-122.40732250528013, 37.76321663178243], [-122.40734634713857, 37.7634701377929], [-122.407350444958, 37.76351465501296], [-122.4074109809267, 37.76414422908748], [-122.40741563753967, 37.76419284412696], [-122.40744394974658, 37.764486769538074], [-122.40746835039857, 37.76474102060663], [-122.40747058557281, 37.76476486246507], [-122.40747375206963, 37.76479727249139], [-122.40748399661818, 37.76490381579631], [-122.40749256478605, 37.76499229144288], [-122.40753056274795, 37.765388476074975], [-122.40754453258688, 37.76553432119342], [-122.407560365071, 37.76569935155733], [-122.40756781565177, 37.76578074915217], [-122.40757899152291, 37.76589288039266], [-122.40760078447165, 37.76611844672527], [-122.40766597705333, 37.766797380897344], [-122.40768460350523, 37.76701251641689], [-122.40769056396985, 37.76708162055347], [-122.40769857334416, 37.7671648807935], [-122.40771813111867, 37.767367722854786], [-122.40772204267357, 37.76740832851995], [-122.40778406875843, 37.768054480136655], [-122.40779133307467, 37.76812954473785], [-122.40781368481696, 37.76836181659315], [-122.40782504695262, 37.76849741716305], [-122.4078315662108, 37.76857434440943], [-122.40779561715861, 37.76870025922433], [-122.40780511664909, 37.76883567352971], [-122.40781834142994, 37.76896382351884], [-122.40782541948167, 37.769033486448976], [-122.40782933103657, 37.76907018055924], [-122.40783715414636, 37.76914561768947], [-122.40790141540545, 37.769241171387755], [-122.40777456926796, 37.76938627144813], [-122.4077466295901, 37.76941644630022], [-122.40768516229879, 37.76948294273353], [-122.40764213519489, 37.769530440185896], [-122.40708632186993, 37.76999554269006], [-122.40699765995885, 37.77006986223317], [-122.40690806672517, 37.77014213286658], [-122.40686299071155, 37.7701784544478], [-122.40630158945102, 37.77063107722918], [-122.40554386538737, 37.77122619236766], [-122.40544589025033, 37.77130330587856], [-122.40536691409424, 37.77136458690534], [-122.40535909098445, 37.77137091989899], [-122.40448457906733, 37.77205842223894], [-122.40401854524058, 37.77242480454798]]
})

Now the same is done except using the original trajectory rather than the map matched one.

val originalTraj = uberData.filter($"tripId" === 11721 || $"tripId" == 10858)
    .select($"latlon").cache
originalTraj: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [latlon: array<double>]
// Convert our Uber data points into GeoJson Geometries 
case class UberData(`type`: String = "MultiPoint",
                    coordinates: Array[(Double, Double)])

object UberJsonProtocol extends DefaultJsonProtocol {
  implicit val uberDataFormat = jsonFormat2(UberData)
}

import UberJsonProtocol._

val originalLatLon = originalTraj
  .map(r => r.getAs[scala.collection.mutable.WrappedArray[Double]]("latlon"))
  .map(point => (point(0), point(1))).collect

val originalJson = UberData(coordinates = originalLatLon).toJson.prettyPrint  // Original Unmatched trajectories
defined class UberData
defined object UberJsonProtocol
import UberJsonProtocol._
originalLatLon: Array[(Double, Double)] = Array((-122.418327,37.769662), (-122.420465,37.752152), (-122.440099,37.755275), (-122.418033,37.769377), (-122.420542,37.752137), (-122.44031,37.75535), (-122.41801,37.769061), (-122.420881,37.7521), (-122.440255,37.755554), (-122.418315,37.76886), (-122.421396,37.75206), (-122.440142,37.755828), (-122.418724,37.768883), (-122.421987,37.75203), (-122.440138,37.756193), (-122.419014,37.769147), (-122.422569,37.752005), (-122.440149,37.756371), (-122.418971,37.769483), (-122.422693,37.752004), (-122.440359,37.756411), (-122.418511,37.769687), (-122.422818,37.752014), (-122.44069,37.756341), (-122.417832,37.769659), (-122.423185,37.751983), (-122.44086,37.756313), (-122.41706,37.769579), (-122.423685,37.751946), (-122.441048,37.756333), (-122.416215,37.769531), (-122.424169,37.751918), (-122.441334,37.756555), (-122.415324,37.769522), (-122.424708,37.751885), (-122.441897,37.756684), (-122.414432,37.76951), (-122.424958,37.751875), (-122.442435,37.756889), (-122.413484,37.769487), (-122.424984,37.751852), (-122.442802,37.757329), (-122.412444,37.769453), (-122.425247,37.751841), (-122.443158,37.757806), (-122.411387,37.769401), (-122.425729,37.751814), (-122.443619,37.758213), (-122.410386,37.769268), (-122.426196,37.751781), (-122.444145,37.758475), (-122.409464,37.769109), (-122.426694,37.751751), (-122.444577,37.758679), (-122.408661,37.768924), (-122.427177,37.751718), (-122.444832,37.759028), (-122.407822,37.76878), (-122.427336,37.751713), (-122.444837,37.759426), (-122.406908,37.768626), (-122.427581,37.7517), (-122.444577,37.759657), (-122.406076,37.768222), (-122.428053,37.75167), (-122.444514,37.759751), (-122.405508,37.767584), (-122.428398,37.751655), (-122.444658,37.75975), (-122.405369,37.76681), (-122.428542,37.751655), (-122.444871,37.759528), (-122.405278,37.765996), (-122.428965,37.751632), (-122.444939,37.759096), (-122.405222,37.765188), (-122.429399,37.751602), (-122.444754,37.758632), (-122.405166,37.764413), (-122.429542,37.751587), (-122.444395,37.758382), (-122.405225,37.763661), (-122.429883,37.751555), (-122.444256,37.758327), (-122.405476,37.762902), (-122.430437,37.751511), (-122.444097,37.758272), (-122.405923,37.762129), (-122.431027,37.751473), (-122.443826,37.758148), (-122.406369,37.761326), (-122.431551,37.751459), (-122.443456,37.757896), (-122.40657,37.760493), (-122.43174,37.751459), (-122.443045,37.757527), (-122.406438,37.759641), (-122.432052,37.751442), (-122.442714,37.757076), (-122.405972,37.758826), (-122.432547,37.751401), (-122.442393,37.756685), (-122.405289,37.758195), (-122.433013,37.751368), (-122.441899,37.756513), (-122.404561,37.757664), (-122.433527,37.75134), (-122.441371,37.756363), (-122.403931,37.757064), (-122.434029,37.751315), (-122.441165,37.755966), (-122.403497,37.756271), (-122.434547,37.751281), (-122.441389,37.75552), (-122.403226,37.754705), (-122.435054,37.751253), (-122.441737,37.755033), (-122.403133,37.753789), (-122.435511,37.751228), (-122.442034,37.754477), (-122.403107,37.753565), (-122.435945,37.751211), (-122.44236,37.754006), (-122.403045,37.752721), (-122.436152,37.7512), (-122.442662,37.753557), (-122.403134,37.751899), (-122.436356,37.751184), (-122.442805,37.753054), (-122.403397,37.751196), (-122.436847,37.751152), (-122.442904,37.752566), (-122.40367,37.750529), (-122.437464,37.751113), (-122.443011,37.752035), (-122.403916,37.74996), (-122.438115,37.751083), (-122.443169,37.751483), (-122.404188,37.749513), (-122.438337,37.75107), (-122.44333,37.750914), (-122.40475,37.749263), (-122.4385,37.751162), (-122.443481,37.750366), (-122.405186,37.749328), (-122.438535,37.75151), (-122.443608,37.74989), (-122.405446,37.749605), (-122.438569,37.751751), (-122.443739,37.749409), (-122.405722,37.750095), (-122.438591,37.751967), (-122.443995,37.749005), (-122.405981,37.750691), (-122.438646,37.752356), (-122.444297,37.748614), (-122.406143,37.751261), (-122.438655,37.752555), (-122.444418,37.748159), (-122.406192,37.751811), (-122.438666,37.752747), (-122.444341,37.747639), (-122.406246,37.752386), (-122.438701,37.753118), (-122.444515,37.747228), (-122.406331,37.752796), (-122.438718,37.75333), (-122.444731,37.747076), (-122.406367,37.752952), (-122.438702,37.753511), (-122.444758,37.747066), (-122.406436,37.752996), (-122.438716,37.753871), (-122.444928,37.747193), (-122.406774,37.753001), (-122.438745,37.75411), (-122.444944,37.747476), (-122.407123,37.752941), (-122.438767,37.75422), (-122.444931,37.747834), (-122.407234,37.752931), (-122.43883,37.754559), (-122.444924,37.748234), (-122.407593,37.752915), (-122.438898,37.755016), (-122.444938,37.748669), (-122.407987,37.752886), (-122.438914,37.755296), (-122.444866,37.749047), (-122.408148,37.752862), (-122.438889,37.755366), (-122.444752,37.749373), (-122.408531,37.752836), (-122.438864,37.755555), (-122.444881,37.749725), (-122.408922,37.752817), (-122.438884,37.755852), (-122.445259,37.750076), (-122.408958,37.752814), (-122.438931,37.756233), (-122.445615,37.750375), (-122.409201,37.752817), (-122.438969,37.756521), (-122.445711,37.750517), (-122.409683,37.752807), (-122.43901,37.756796), (-122.445731,37.75057), (-122.409911,37.752785), (-122.439033,37.757037), (-122.445745,37.750608), (-122.410194,37.752723), (-122.439219,37.757126), (-122.445524,37.750736), (-122.410691,37.752689), (-122.439402,37.757132), (-122.445241,37.751019), (-122.410904,37.752679), (-122.439496,37.75712), (-122.445072,37.751368), (-122.411205,37.752666), (-122.439774,37.757185), (-122.445045,37.751731), (-122.411634,37.752642), (-122.440133,37.757251), (-122.445205,37.752141), (-122.411762,37.752633), (-122.440426,37.757365), (-122.445419,37.752573), (-122.422617,37.770549), (-122.411999,37.752599), (-122.440673,37.757582), (-122.445379,37.752959), (-122.422602,37.770505), (-122.412454,37.752544), (-122.440823,37.757866), (-122.445243,37.753398), (-122.42258,37.770472), (-122.412856,37.752522), (-122.440919,37.758086), (-122.445065,37.753805), (-122.422556,37.770377), (-122.412994,37.752521), (-122.440957,37.758141), (-122.444786,37.754166), (-122.4225,37.770249), (-122.413365,37.752504), (-122.440916,37.758093), (-122.44464,37.754517), (-122.422447,37.770132), (-122.413775,37.752444), (-122.440873,37.758018), (-122.444673,37.754914), (-122.422431,37.770048), (-122.413914,37.752431), (-122.440795,37.757945), (-122.444669,37.755273), (-122.422408,37.770006), (-122.414045,37.752478), (-122.44073,37.757726), (-122.444413,37.755485), (-122.422391,37.76999), (-122.414408,37.752469), (-122.440588,37.757431), (-122.444293,37.755533), (-122.422259,37.769866), (-122.41493,37.752428), (-122.440358,37.757253), (-122.42187,37.769849), (-122.415443,37.752393), (-122.439989,37.757148), (-122.4214,37.769876), (-122.415963,37.752366), (-122.439585,37.757045), (-122.421024,37.769912), (-122.416165,37.75235), (-122.439192,37.757043), (-122.420889,37.76993), (-122.416402,37.752351), (-122.439015,37.756936), (-122.420832,37.769942), (-122.41688,37.75233), (-122.438978,37.756708), (-122.420759,37.769902), (-122.417439,37.752294), (-122.438946,37.756414), (-122.420504,37.769874), (-122.417951,37.752258), (-122.438909,37.756055), (-122.420153,37.769848), (-122.418225,37.752245), (-122.438872,37.755653), (-122.419805,37.769819), (-122.41827,37.752243), (-122.438862,37.755536), (-122.419466,37.769781), (-122.418785,37.752229), (-122.438864,37.755472), (-122.418976,37.769726), (-122.419291,37.752216), (-122.438915,37.755413), (-122.418672,37.769685), (-122.419473,37.752208), (-122.43896,37.755403), (-122.418487,37.769666), (-122.419781,37.752187), (-122.439242,37.755366), (-122.418453,37.769667), (-122.420279,37.752162), (-122.439668,37.755331))
originalJson: String =
{
  "type": "MultiPoint",
  "coordinates": [[-122.418327, 37.769662], [-122.420465, 37.752152], [-122.440099, 37.755275], [-122.418033, 37.769377], [-122.420542, 37.752137], [-122.44031, 37.75535], [-122.41801, 37.769061], [-122.420881, 37.7521], [-122.440255, 37.755554], [-122.418315, 37.76886], [-122.421396, 37.75206], [-122.440142, 37.755828], [-122.418724, 37.768883], [-122.421987, 37.75203], [-122.440138, 37.756193], [-122.419014, 37.769147], [-122.422569, 37.752005], [-122.440149, 37.756371], [-122.418971, 37.769483], [-122.422693, 37.752004], [-122.440359, 37.756411], [-122.418511, 37.769687], [-122.422818, 37.752014], [-122.44069, 37.756341], [-122.417832, 37.769659], [-122.423185, 37.751983], [-122.44086, 37.756313], [-122.41706, 37.769579], [-122.423685, 37.751946], [-122.441048, 37.756333], [-122.416215, 37.769531], [-122.424169, 37.751918], [-122.441334, 37.756555], [-122.415324, 37.769522], [-122.424708, 37.751885], [-122.441897, 37.756684], [-122.414432, 37.76951], [-122.424958, 37.751875], [-122.442435, 37.756889], [-122.413484, 37.769487], [-122.424984, 37.751852], [-122.442802, 37.757329], [-122.412444, 37.769453], [-122.425247, 37.751841], [-122.443158, 37.757806], [-122.411387, 37.769401], [-122.425729, 37.751814], [-122.443619, 37.758213], [-122.410386, 37.769268], [-122.426196, 37.751781], [-122.444145, 37.758475], [-122.409464, 37.769109], [-122.426694, 37.751751], [-122.444577, 37.758679], [-122.408661, 37.768924], [-122.427177, 37.751718], [-122.444832, 37.759028], [-122.407822, 37.76878], [-122.427336, 37.751713], [-122.444837, 37.759426], [-122.406908, 37.768626], [-122.427581, 37.7517], [-122.444577, 37.759657], [-122.406076, 37.768222], [-122.428053, 37.75167], [-122.444514, 37.759751], [-122.405508, 37.767584], [-122.428398, 37.751655], [-122.444658, 37.75975], [-122.405369, 37.76681], [-122.428542, 37.751655], [-122.444871, 37.759528], [-122.405278, 37.765996], [-122.428965, 37.751632], [-122.444939, 37.759096], [-122.405222, 37.765188], [-122.429399, 37.751602], [-122.444754, 37.758632], [-122.405166, 37.764413], [-122.429542, 37.751587], [-122.444395, 37.758382], [-122.405225, 37.763661], [-122.429883, 37.751555], [-122.444256, 37.758327], [-122.405476, 37.762902], [-122.430437, 37.751511], [-122.444097, 37.758272], [-122.405923, 37.762129], [-122.431027, 37.751473], [-122.443826, 37.758148], [-122.406369, 37.761326], [-122.431551, 37.751459], [-122.443456, 37.757896], [-122.40657, 37.760493], [-122.43174, 37.751459], [-122.443045, 37.757527], [-122.406438, 37.759641], [-122.432052, 37.751442], [-122.442714, 37.757076], [-122.405972, 37.758826], [-122.432547, 37.751401], [-122.442393, 37.756685], [-122.405289, 37.758195], [-122.433013, 37.751368], [-122.441899, 37.756513], [-122.404561, 37.757664], [-122.433527, 37.75134], [-122.441371, 37.756363], [-122.403931, 37.757064], [-122.434029, 37.751315], [-122.441165, 37.755966], [-122.403497, 37.756271], [-122.434547, 37.751281], [-122.441389, 37.75552], [-122.403226, 37.754705], [-122.435054, 37.751253], [-122.441737, 37.755033], [-122.403133, 37.753789], [-122.435511, 37.751228], [-122.442034, 37.754477], [-122.403107, 37.753565], [-122.435945, 37.751211], [-122.44236, 37.754006], [-122.403045, 37.752721], [-122.436152, 37.7512], [-122.442662, 37.753557], [-122.403134, 37.751899], [-122.436356, 37.751184], [-122.442805, 37.753054], [-122.403397, 37.751196], [-122.436847, 37.751152], [-122.442904, 37.752566], [-122.40367, 37.750529], [-122.437464, 37.751113], [-122.443011, 37.752035], [-122.403916, 37.74996], [-122.438115, 37.751083], [-122.443169, 37.751483], [-122.404188, 37.749513], [-122.438337, 37.75107], [-122.44333, 37.750914], [-122.40475, 37.749263], [-122.4385, 37.751162], [-122.443481, 37.750366], [-122.405186, 37.749328], [-122.438535, 37.75151], [-122.443608, 37.74989], [-122.405446, 37.749605], [-122.438569, 37.751751], [-122.443739, 37.749409], [-122.405722, 37.750095], [-122.438591, 37.751967], [-122.443995, 37.749005], [-122.405981, 37.750691], [-122.438646, 37.752356], [-122.444297, 37.748614], [-122.406143, 37.751261], [-122.438655, 37.752555], [-122.444418, 37.748159], [-122.406192, 37.751811], [-122.438666, 37.752747], [-122.444341, 37.747639], [-122.406246, 37.752386], [-122.438701, 37.753118], [-122.444515, 37.747228], [-122.406331, 37.752796], [-122.438718, 37.75333], [-122.444731, 37.747076], [-122.406367, 37.752952], [-122.438702, 37.753511], [-122.444758, 37.747066], [-122.406436, 37.752996], [-122.438716, 37.753871], [-122.444928, 37.747193], [-122.406774, 37.753001], [-122.438745, 37.75411], [-122.444944, 37.747476], [-122.407123, 37.752941], [-122.438767, 37.75422], [-122.444931, 37.747834], [-122.407234, 37.752931], [-122.43883, 37.754559], [-122.444924, 37.748234], [-122.407593, 37.752915], [-122.438898, 37.755016], [-122.444938, 37.748669], [-122.407987, 37.752886], [-122.438914, 37.755296], [-122.444866, 37.749047], [-122.408148, 37.752862], [-122.438889, 37.755366], [-122.444752, 37.749373], [-122.408531, 37.752836], [-122.438864, 37.755555], [-122.444881, 37.749725], [-122.408922, 37.752817], [-122.438884, 37.755852], [-122.445259, 37.750076], [-122.408958, 37.752814], [-122.438931, 37.756233], [-122.445615, 37.750375], [-122.409201, 37.752817], [-122.438969, 37.756521], [-122.445711, 37.750517], [-122.409683, 37.752807], [-122.43901, 37.756796], [-122.445731, 37.75057], [-122.409911, 37.752785], [-122.439033, 37.757037], [-122.445745, 37.750608], [-122.410194, 37.752723], [-122.439219, 37.757126], [-122.445524, 37.750736], [-122.410691, 37.752689], [-122.439402, 37.757132], [-122.445241, 37.751019], [-122.410904, 37.752679], [-122.439496, 37.75712], [-122.445072, 37.751368], [-122.411205, 37.752666], [-122.439774, 37.757185], [-122.445045, 37.751731], [-122.411634, 37.752642], [-122.440133, 37.757251], [-122.445205, 37.752141], [-122.411762, 37.752633], [-122.440426, 37.757365], [-122.445419, 37.752573], [-122.422617, 37.770549], [-122.411999, 37.752599], [-122.440673, 37.757582], [-122.445379, 37.752959], [-122.422602, 37.770505], [-122.412454, 37.752544], [-122.440823, 37.757866], [-122.445243, 37.753398], [-122.42258, 37.770472], [-122.412856, 37.752522], [-122.440919, 37.758086], [-122.445065, 37.753805], [-122.422556, 37.770377], [-122.412994, 37.752521], [-122.440957, 37.758141], [-122.444786, 37.754166], [-122.4225, 37.770249], [-122.413365, 37.752504], [-122.440916, 37.758093], [-122.44464, 37.754517], [-122.422447, 37.770132], [-122.413775, 37.752444], [-122.440873, 37.758018], [-122.444673, 37.754914], [-122.422431, 37.770048], [-122.413914, 37.752431], [-122.440795, 37.757945], [-122.444669, 37.755273], [-122.422408, 37.770006], [-122.414045, 37.752478], [-122.44073, 37.757726], [-122.444413, 37.755485], [-122.422391, 37.76999], [-122.414408, 37.752469], [-122.440588, 37.757431], [-122.444293, 37.755533], [-122.422259, 37.769866], [-122.41493, 37.752428], [-122.440358, 37.757253], [-122.42187, 37.769849], [-122.415443, 37.752393], [-122.439989, 37.757148], [-122.4214, 37.769876], [-122.415963, 37.752366], [-122.439585, 37.757045], [-122.421024, 37.769912], [-122.416165, 37.75235], [-122.439192, 37.757043], [-122.420889, 37.76993], [-122.416402, 37.752351], [-122.439015, 37.756936], [-122.420832, 37.769942], [-122.41688, 37.75233], [-122.438978, 37.756708], [-122.420759, 37.769902], [-122.417439, 37.752294], [-122.438946, 37.756414], [-122.420504, 37.769874], [-122.417951, 37.752258], [-122.438909, 37.756055], [-122.420153, 37.769848], [-122.418225, 37.752245], [-122.438872, 37.755653], [-122.419805, 37.769819], [-122.41827, 37.752243], [-122.438862, 37.755536], [-122.419466, 37.769781], [-122.418785, 37.752229], [-122.438864, 37.755472], [-122.418976, 37.769726], [-122.419291, 37.752216], [-122.438915, 37.755413], [-122.418672, 37.769685], [-122.419473, 37.752208], [-122.43896, 37.755403], [-122.418487, 37.769666], [-122.419781, 37.752187], [-122.439242, 37.755366], [-122.418453, 37.769667], [-122.420279, 37.752162], [-122.439668, 37.755331]]
}
  1. Display result of a map-matched trajectory

val trajHTML = genLeafletHTML(mapMatchedTrajectories ++ Array(originalJson))
displayHTML(trajHTML) // zoom and play - orange dots are raw and azure dots are map-matched
Maps

Visualization & MapMatching (Further things one could do).

  • Show Direction of Travel

  • Get timestamp for points, currently Graphhopper map matching does no preserve this information.

  • Map the matched coordinates to OSM Way Ids. See here to extract OSM ids from the graphhopper graph edges with GraphHopper, does however require 0.6 SNAPSHOT for it to work.

    Another potential way to do this is just to reverse geocode with something such as http://nominatim.openstreetmap.org/

#curl -O https://download.bbbike.org/osm/bbbike/SanFrancisco/SanFrancisco.osm.pbf

The below osm.pbf file was downloaded deom the above link as Marina pointed out yesterday and received via email:

your requested OpenStreetMap area 'San Francisco' was extracted from planet.osm
To download the file, please click on the following link:

  https://download.bbbike.org/osm/extract/planet_-122.529,37.724_-122.352,37.811.osm.pbf

The file will be available for the next 48 hours. Please download the
file as soon as possible.

 Name: San Francisco
 Coordinates: -122.529,37.724 x -122.352,37.811
 Script URL: https://extract.bbbike.org/?sw_lng=-122.529&sw_lat=37.724&ne_lng=-122.352&ne_lat=37.811&format=osm.pbf&city=San%20Francisco
 Square kilometre: 150
 Granularity: 100 (1.1 cm)
 Format: osm.pbf
 File size: 8.5 MB
 SHA256 checksum: 8fe277a3b23ebd5a612d21cc50a5287bae3a169867c631353e9a1da3963cd617
 MD5 checksum: 9d2c5650547623bbca1656db84efeb7d
 Last planet.osm database update: Thu May  3 05:46:46 2018 UTC
 License: OpenStreetMap License

Please read the extract online help for more informations:
https://extract.bbbike.org/extract.html

and the much smaller map has these details:

your requested OpenStreetMap area 'San Francisco' was extracted from planet.osm
To download the file, please click on the following link:

  https://download.bbbike.org/osm/extract/planet_-122.449,37.747_-122.397,37.772.osm.pbf

The file will be available for the next 48 hours. Please download the
file as soon as possible.

 Name: San Francisco
 Coordinates: -122.449,37.747 x -122.397,37.772
 Script URL: https://extract.bbbike.org/?sw_lng=-122.449&sw_lat=37.747&ne_lng=-122.397&ne_lat=37.772&format=osm.pbf&city=San%20Francisco
 Square kilometre: 12
 Granularity: 100 (1.1 cm)
 Format: osm.pbf
 File size: 1.3 MB
 SHA256 checksum: 4fa2c4137e9eabdacc840ebcd9f741470c617c43d4d852d528e1baa44d2fb190
 MD5 checksum: 38f2954459efa8d95f65a16f844adebf
# smaller SF osm.pbf file as the driver crashes with the above larger map
 curl -O https://download.bbbike.org/osm/bbbike/SanFrancisco/SanFrancisco.osm.pbf # nearly 17MB and too big for community edition...
#curl -O https://download.bbbike.org/osm/extract/planet_-122.529,37.724_-122.352,37.811.osm.pbf
#curl -O https://download.bbbike.org/osm/extract/planet_-122.449,37.747_-122.397,37.772.osm.gz # much smaller map of SF
# backups in progress here... http://lamastex.org/.../SanFrancisco_-122.529_37.724__-122.352_37.811.osm.pbf
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 20.0M  100 20.0M    0     0  22.8M      0 --:--:-- --:--:-- --:--:-- 22.7M
ls
SanFrancisco.osm.pbf
conf
derby.log
eventlogs
logs
dbutils.fs.mkdirs("dbfs:/files/graphhopper/osm/")
res1: Boolean = true
dbutils.fs.rm("dbfs:/datasets/graphhopper/osm/SanFrancisco.osm.pbf",recurse=true) // to remove any pre-existing file with same name in dbfs
res2: Boolean = false
dbutils.fs.mv("file:/databricks/driver/SanFrancisco.osm.pbf", "dbfs:/files/graphhopper/osm/SanFrancisco.osm.pbf") // too big for driver memory
//dbutils.fs.mv("file:/databricks/driver/planet_-122.529,37.724_-122.352,37.811.osm.pbf", "dbfs:/datasets/graphhopper/osm/SanFrancisco.osm.pbf")
//dbutils.fs.mv("file:/databricks/driver/planet_-122.449,37.747_-122.397,37.772.osm.gz", "dbfs:/files/graphhopper/osm/SanFranciscoSmall.osm.gz")
res3: Boolean = true
display(dbutils.fs.ls("dbfs:/files/graphhopper/osm"))
path name size
dbfs:/files/graphhopper/osm/SanFrancisco.osm.pbf SanFrancisco.osm.pbf 2.1059693e7
dbutils.fs.mkdirs("dbfs:/files/graphhopper/graphHopperData") // Where graphhopper will store its data
res5: Boolean = true

Process an OSM file, creating from it a GraphHopper Graph. The contents of this graph are then stored in the distributed filesystem to be accessed for later use. This ensures that the processing step only takes place once, and subsequent GraphHopper objects can simply read these files to start map matching.

val osmPath = "/dbfs/files/graphhopper/osm/SanFrancisco.osm.pbf"
val graphHopperPath = "/dbfs/files/graphhopper/graphHopperData"
osmPath: String = /dbfs/files/graphhopper/osm/SanFrancisco.osm.pbf
graphHopperPath: String = /dbfs/files/graphhopper/graphHopperData
val encoder = new CarFlagEncoder()
 
val hopper = new GraphHopper()
      .setStoreOnFlush(true)
      .setEncodingManager(new EncodingManager(encoder))
      .setOSMFile(osmPath)
      .setCHWeightings("shortest")
      .setGraphHopperLocation("graphhopper/")

hopper.importOrLoad()
encoder: com.graphhopper.routing.util.CarFlagEncoder = car
hopper: com.graphhopper.GraphHopper = com.graphhopper.GraphHopper@5d5ac119
res6: com.graphhopper.GraphHopper = com.graphhopper.GraphHopper@5d5ac119

Move the GraphHopper object to dbfs:

dbutils.fs.mv("file:/databricks/driver/graphhopper", "dbfs:/files/graphhopper/graphHopperData", recurse=true)
res7: Boolean = true
display(dbutils.fs.ls("dbfs:/files/graphhopper/graphHopperData"))
path name size
dbfs:/files/graphhopper/graphHopperData/edges edges 3145828.0
dbfs:/files/graphhopper/graphHopperData/geometry geometry 1048676.0
dbfs:/files/graphhopper/graphHopperData/location_index location_index 1048676.0
dbfs:/files/graphhopper/graphHopperData/names names 1048676.0
dbfs:/files/graphhopper/graphHopperData/nodes nodes 1048676.0
dbfs:/files/graphhopper/graphHopperData/nodes_ch_shortest_car nodes_ch_shortest_car 1048676.0
dbfs:/files/graphhopper/graphHopperData/properties properties 32868.0
dbfs:/files/graphhopper/graphHopperData/shortcuts_shortest_car shortcuts_shortest_car 3145828.0

ScaDaMaLe Course site and book

This notebook is originally from: (link not working)

  • https://cdn2.hubspot.net/hubfs/438089/notebooks/MobileSample.html

You can download a tiny sample dataset from here:

wget http://lamastex.org/datasets/public/geospatial/misc/mobile_sample.csv

The main purpose is to show how SQL can be used for geospatial data at the resolution of countries.

Mobile Sample Data (Sample)

This notebook contains various chart examples based on a sample mobile phone dataset. * Note, this dataset joins the mobile sample table and the country codes. * Notice that the country names do not match completely hence the use of the case statement within the join.

wget http://lamastex.org/datasets/public/geospatial/misc/mobile_sample.csv
--2022-02-02 16:28:20--  http://lamastex.org/datasets/public/geospatial/misc/mobile_sample.csv
Resolving lamastex.org (lamastex.org)... 166.62.28.100
Connecting to lamastex.org (lamastex.org)|166.62.28.100|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1713 (1.7K) [text/csv]
Saving to: ‘mobile_sample.csv’

     0K .                                                     100%  175M=0s

2022-02-02 16:28:20 (175 MB/s) - ‘mobile_sample.csv’ saved [1713/1713]
pwd
/databricks/driver
dbutils.fs.mkdirs("dbfs:/datasets/mobile_sample")
res0: Boolean = true
dbutils.fs.cp("file:/databricks/driver/mobile_sample.csv", "dbfs:/datasets/mobile_sample/") // load into dbfs
res1: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/mobile_sample/"))
path name size
dbfs:/datasets/mobile_sample/mobile_sample.csv mobile_sample.csv 1713.0

Create SQL tables for each dataset.

CREATE TABLE mobile_sample USING com.databricks.spark.csv OPTIONS(path 'dbfs:/datasets/mobile_sample/mobile_sample.csv', header "true")
select * from mobile_sample
CountryCode3 Apple HTC ASUS LG DELL Huawei FujitsuToshibaMobileCommun Archos Casio Kyocera
ARE 2 1 0 0 0 0 0 0 0 0
ARG 0 0 3 0 0 0 0 0 0 0
AUS 20 27 0 8 0 0 0 0 0 0
AUT 1 4 0 1 0 0 0 0 0 0
BEL 0 1 0 1 0 0 0 0 0 0
BGD 2 0 0 0 0 0 0 0 0 0
BHS 25 0 0 0 0 0 0 0 0 0
BMU 6 0 0 0 0 0 0 0 0 0
BRA 6 3 0 1 0 0 0 0 0 0
BRN 1 0 0 0 0 0 0 0 0 0
CAN 1 12 0 46 1 0 0 0 0 0
CHE 10 1 0 0 0 0 0 0 0 0
CHN 0 58 0 0 0 0 0 0 0 0
CYP 1 2 0 0 0 0 0 0 0 0
CZE 0 1 0 0 0 0 0 0 0 0
DEU 1 20 0 22 0 0 0 0 0 0
DNK 1 3 0 0 0 0 0 0 0 0
EGY 0 0 0 0 0 1 0 0 0 0
ESP 0 18 0 11 0 0 0 0 0 0
ETH 4 0 0 4 0 0 0 0 0 0
FIN 0 0 0 1 0 0 0 0 0 0
FJI 3 0 0 0 0 0 0 0 0 0
FRA 4 32 0 8 0 0 0 0 0 0
GGY 26 0 0 0 0 0 0 0 0 0
GIB 2 0 0 0 0 0 0 0 0 0
GRC 0 0 0 1 0 0 0 0 0 0
GUM 13 0 0 0 0 0 0 0 0 0
HKG 0 1 0 5 0 0 0 0 0 0
HTI 28 0 0 0 0 0 0 0 0 0
HUN 0 1 0 1 0 0 0 0 0 0
IDN 0 4 0 0 0 0 0 0 0 0
IND 9 14 0 0 1 0 0 0 0 0
IRL 0 8 0 1 0 0 0 0 0 0
ITA 1 3 0 21 0 0 0 0 0 0
JAM 2 7 0 0 0 0 0 0 0 0
JPN 5 6 0 0 0 0 2 0 0 0
KAZ 6 0 0 0 0 0 0 0 0 0
KHM 2 0 0 0 0 0 0 0 0 0
LCA 13 0 0 0 0 0 0 0 0 0
LUX 3 0 0 0 0 0 0 0 0 0
LVA 1 0 0 0 0 0 0 0 0 0
MAR 1 0 0 0 0 0 0 0 0 0
MEX 0 0 0 16 0 0 0 0 0 0
MLT 11 0 0 0 0 0 0 0 0 0
MMR 1 0 0 0 0 0 0 0 0 0
MTQ 3 0 0 0 0 0 0 0 0 0
MUS 4 0 0 0 0 0 0 0 0 0
MYS 0 7 0 0 0 0 0 0 0 0
NGA 2 0 0 0 0 0 0 0 0 0
NLD 4 6 0 1 0 0 0 0 0 0
NOR 0 3 0 0 0 0 0 0 0 0
NPL 2 0 0 0 0 0 0 0 0 0
NZL 0 2 0 0 0 0 0 0 0 0
PAK 7 0 0 0 0 0 0 0 0 0
PHL 0 1 0 0 0 0 0 0 0 0
POL 0 3 0 0 0 0 0 0 0 0
RUS 0 2 0 0 0 0 0 0 0 0
SGP 1 0 0 0 0 0 0 0 0 0
SRB 1 1 0 0 0 0 0 0 0 0
SWE 1 0 0 3 0 0 0 0 0 0
THA 0 1 0 0 0 0 0 0 0 0
TUR 1 0 0 0 0 0 0 0 0 0
UKR 0 2 0 0 0 0 0 0 0 0
USA 21004 2554 42 7940 52 229 0 1 996 117
VNM 0 5 0 1 0 0 0 0 0 0
ZAF 2 1 0 0 0 0 0 0 0 0

Next cell doesn't work: - Mobile_sample table does not contain ClientID, DeviceMake or Country columns. - Data to create country codes tables is needed.

select m.ClientID, c.CountryCode3, m.DeviceMake 
from mobile_sample m 
   join countrycodes c 
      on m.Country = c.Country
cache table mobile_sample
select DeviceMake, count(1) as DeviceCnt from mobile_sample where Country = 'United States' group by DeviceMake order by DeviceCnt desc limit 10
select m.clientid, s.StateCodes from mobile_sample m join state_codes s on s.state = m.state
select m.clientid, m.DeviceMake, s.StateCodes from mobile_sample m join state_codes s on s.state = m.state
clientid DeviceMake StateCodes
4688.0 RIM VA
4688.0 RIM VA
4688.0 RIM VA
5251.0 Apple VA
6056.0 Samsung VA
7130.0 SAMSUNG VA
7162.0 Samsung VA
7162.0 Samsung VA
9530.0 HTC VA
11561.0 Apple VA
13511.0 Apple VA
14090.0 Apple VA
16420.0 RIM VA
16495.0 Apple VA
16495.0 Apple VA
16495.0 Apple VA
16495.0 Apple VA
16495.0 Apple VA
16495.0 Apple VA
16665.0 LG VA
16665.0 LG VA
17100.0 Apple VA
17100.0 Apple VA
17100.0 Apple VA
17100.0 Apple VA
17100.0 Apple VA
18803.0 Apple VA
18803.0 Apple VA
18803.0 Apple VA
18803.0 Apple VA
18803.0 Apple VA
18803.0 Apple VA
18803.0 Apple VA
18855.0 HTC VA
18855.0 HTC VA
18855.0 HTC VA
18855.0 HTC VA
18855.0 HTC VA
18855.0 HTC VA
18855.0 HTC VA
18855.0 HTC VA
18855.0 HTC VA
18855.0 HTC VA
20058.0 SAMSUNG VA
21120.0 HTC VA
21120.0 HTC VA
21120.0 HTC VA
21120.0 HTC VA
21120.0 HTC VA
21120.0 HTC VA
21120.0 HTC VA
21120.0 HTC VA
23227.0 Apple VA
23386.0 Apple VA
23386.0 Apple VA
23386.0 Apple VA
23386.0 Apple VA
23386.0 Apple VA
23386.0 Apple VA
23386.0 Apple VA
25174.0 Apple VA
26483.0 Motorola VA
26483.0 Motorola VA
26844.0 Motorola VA
27613.0 Apple VA
27616.0 Samsung VA
27616.0 Samsung VA
28703.0 Apple VA
34264.0 HTC VA
34409.0 Apple VA
34409.0 Apple VA
34409.0 Apple VA
34409.0 Apple VA
38897.0 Apple VA
41623.0 Samsung VA
41623.0 Samsung VA
41623.0 Samsung VA
41623.0 Samsung VA
41994.0 LG VA
41994.0 LG VA
42108.0 Apple VA
42108.0 Apple VA
42108.0 Apple VA
42108.0 Apple VA
42108.0 Apple VA
43885.0 Apple VA
44680.0 Apple VA
45525.0 Unknown VA
45525.0 Unknown VA
46999.0 Samsung VA
46999.0 Samsung VA
46999.0 Samsung VA
47088.0 Unknown VA
50378.0 HTC VA
50378.0 HTC VA
50378.0 HTC VA
50378.0 HTC VA
50378.0 HTC VA
50378.0 HTC VA
50378.0 HTC VA
50378.0 HTC VA
50378.0 HTC VA
50523.0 Samsung VA
50523.0 Samsung VA
50523.0 Samsung VA
50523.0 Samsung VA
50523.0 Samsung VA
55259.0 Apple VA
55259.0 Apple VA
55259.0 Apple VA
55259.0 Apple VA
55259.0 Apple VA
55259.0 Apple VA
55958.0 Apple VA
55958.0 Apple VA
55958.0 Apple VA
55958.0 Apple VA
55958.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
56836.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
57116.0 Apple VA
58197.0 Apple VA
58197.0 Apple VA
58197.0 Apple VA
58197.0 Apple VA
58197.0 Apple VA
58197.0 Apple VA
58197.0 Apple VA
58197.0 Apple VA
59178.0 Apple VA
59178.0 Apple VA
59178.0 Apple VA
59178.0 Apple VA
59178.0 Apple VA
59178.0 Apple VA
63752.0 Apple VA
63752.0 Apple VA
63752.0 Apple VA
63752.0 Apple VA
66119.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70552.0 Apple VA
70675.0 Samsung VA
70675.0 Samsung VA
71722.0 Apple VA
71722.0 Apple VA
73079.0 Samsung VA
73130.0 Apple VA
73240.0 Apple VA
74962.0 Apple VA
75052.0 Apple VA
76722.0 Apple VA
76784.0 Apple VA
76784.0 Apple VA
76784.0 Apple VA
78472.0 Apple VA
78472.0 Apple VA
78472.0 Apple VA
80120.0 Unknown VA
81230.0 Apple VA
84397.0 Apple VA
84397.0 Apple VA
84397.0 Apple VA
84397.0 Apple VA
84648.0 RIM VA
84648.0 RIM VA
84648.0 RIM VA
84648.0 RIM VA
84648.0 RIM VA
84648.0 RIM VA
84648.0 RIM VA
84648.0 RIM VA
84648.0 RIM VA
84648.0 RIM VA
84680.0 Apple VA
84680.0 Apple VA
86418.0 Apple VA
86418.0 Apple VA
86418.0 Apple VA
86418.0 Apple VA
86565.0 Apple VA
86812.0 Apple VA
88516.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
88704.0 Apple VA
91487.0 Apple VA
91487.0 Apple VA
91487.0 Apple VA
91487.0 Apple VA
91487.0 Apple VA
92114.0 Apple VA
92114.0 Apple VA
92114.0 Apple VA
92114.0 Apple VA
92114.0 Apple VA
92114.0 Apple VA
92114.0 Apple VA
92114.0 Apple VA
92877.0 Unknown VA
92877.0 Unknown VA
92877.0 Unknown VA
92877.0 Unknown VA
93948.0 Samsung VA
93948.0 Samsung VA
93948.0 Samsung VA
93948.0 Samsung VA
93948.0 Samsung VA
93948.0 Samsung VA
93948.0 Samsung VA
93948.0 Samsung VA
93948.0 Samsung VA
93948.0 Samsung VA
94406.0 RIM VA
94406.0 RIM VA
94544.0 RIM VA
95285.0 Motorola VA
95285.0 Motorola VA
96482.0 Apple VA
97377.0 Apple VA
99963.0 Apple VA
100246.0 Unknown VA
101106.0 Apple VA
101346.0 HTC VA
101699.0 LG VA
102673.0 Apple VA
102673.0 Apple VA
102673.0 Apple VA
103769.0 Apple VA
103769.0 Apple VA
103769.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
103816.0 Apple VA
105828.0 Unknown VA
106628.0 Apple VA
106665.0 Apple VA
110539.0 Apple VA
110539.0 Apple VA
110539.0 Apple VA
110539.0 Apple VA
110539.0 Apple VA
110539.0 Apple VA
111832.0 Samsung VA
111832.0 Samsung VA
115313.0 Samsung VA
116832.0 Apple VA
119953.0 Apple VA
119953.0 Apple VA
119953.0 Apple VA
119953.0 Apple VA
121290.0 Samsung VA
124122.0 Apple VA
124122.0 Apple VA
124122.0 Apple VA
124122.0 Apple VA
124122.0 Apple VA
124293.0 RIM VA
124293.0 RIM VA
124293.0 RIM VA
126664.0 Apple VA
127802.0 Apple VA
128989.0 Apple VA
128989.0 Apple VA
128989.0 Apple VA
129236.0 Motorola VA
129378.0 Apple VA
130696.0 Apple VA
130696.0 Apple VA
130696.0 Apple VA
130696.0 Apple VA
130696.0 Apple VA
131735.0 LG VA
131735.0 LG VA
134020.0 Samsung VA
134020.0 Samsung VA
134020.0 Samsung VA
134020.0 Samsung VA
134020.0 Samsung VA
134020.0 Samsung VA
134020.0 Samsung VA
134020.0 Samsung VA
134020.0 Samsung VA
134020.0 Samsung VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
135176.0 Apple VA
137908.0 Apple VA
139839.0 Apple VA
139839.0 Apple VA
140058.0 Apple VA
140058.0 Apple VA
685.0 Apple FL
781.0 Apple FL
1156.0 LG FL
1156.0 LG FL
1156.0 LG FL
1156.0 LG FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1371.0 Apple FL
1775.0 LG FL
1775.0 LG FL
1775.0 LG FL
1775.0 LG FL
1775.0 LG FL
1935.0 Samsung FL
1935.0 Samsung FL
2614.0 Apple FL
2614.0 Apple FL
3253.0 Samsung FL
3253.0 Samsung FL
3488.0 Samsung FL
3488.0 Samsung FL
3488.0 Samsung FL
3584.0 HTC FL
3584.0 HTC FL
3724.0 Samsung FL
3724.0 Samsung FL
3724.0 Samsung FL
3825.0 Apple FL
4142.0 Samsung FL
4142.0 Samsung FL
4605.0 Apple FL
4605.0 Apple FL
4605.0 Apple FL
4605.0 Apple FL
4605.0 Apple FL
4605.0 Apple FL
4605.0 Apple FL
4605.0 Apple FL
4605.0 Apple FL
4605.0 Apple FL
4725.0 Apple FL
4734.0 Apple FL
4734.0 Apple FL
4734.0 Apple FL
4829.0 LG FL
4829.0 LG FL
4829.0 LG FL
4829.0 LG FL
4829.0 LG FL
4829.0 LG FL
4829.0 LG FL
4829.0 LG FL
5180.0 Apple FL
5180.0 Apple FL
5180.0 Apple FL
5180.0 Apple FL
5300.0 HTC FL
5300.0 HTC FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5377.0 LG FL
5647.0 Unknown FL
5647.0 Unknown FL
5647.0 Unknown FL
5702.0 Apple FL
5702.0 Apple FL
6082.0 Apple FL
6107.0 Apple FL
6107.0 Apple FL
6107.0 Apple FL
6107.0 Apple FL
6107.0 Apple FL
6107.0 Apple FL
6107.0 Apple FL
6107.0 Apple FL
6266.0 LG FL
6266.0 LG FL
6266.0 LG FL
6285.0 LG FL
6285.0 LG FL
6285.0 LG FL
6285.0 LG FL
6285.0 LG FL
6827.0 Apple FL
7244.0 Unknown FL
7244.0 Unknown FL
7244.0 Unknown FL
7817.0 HTC FL
8226.0 LG FL
8226.0 LG FL
8233.0 Apple FL
8604.0 LG FL
8604.0 LG FL
8772.0 LG FL
8902.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9023.0 Apple FL
9379.0 HTC FL
10621.0 Apple FL
10713.0 Apple FL
10961.0 Samsung FL
10961.0 Samsung FL
10961.0 Samsung FL
11102.0 Samsung FL
11418.0 LG FL
11890.0 Samsung FL
12106.0 Apple FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
12576.0 Samsung FL
13568.0 Samsung FL
13568.0 Samsung FL
13687.0 RIM FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13816.0 Samsung FL
13997.0 Apple FL
13997.0 Apple FL
13997.0 Apple FL
14044.0 Huawei FL
14095.0 LG FL
14095.0 LG FL
14184.0 Apple FL
14495.0 Apple FL
15063.0 Samsung FL
15063.0 Samsung FL
15063.0 Samsung FL
15063.0 Samsung FL
15362.0 LG FL
15362.0 LG FL
15362.0 LG FL
15362.0 LG FL
15362.0 LG FL
16316.0 Apple FL
16316.0 Apple FL
16316.0 Apple FL
16316.0 Apple FL
16316.0 Apple FL
16316.0 Apple FL
16317.0 LG FL
16317.0 LG FL
16317.0 LG FL
16855.0 Apple FL
18146.0 LG FL
18452.0 Samsung FL
18452.0 Samsung FL
18452.0 Samsung FL
18455.0 Motorola FL
18455.0 Motorola FL
18455.0 Motorola FL
18455.0 Motorola FL
18455.0 Motorola FL
18455.0 Motorola FL
18494.0 LG FL
18554.0 LG FL
18554.0 LG FL
18554.0 LG FL
18875.0 Samsung FL
18875.0 Samsung FL
18875.0 Samsung FL
18875.0 Samsung FL
18875.0 Samsung FL
18875.0 Samsung FL
18951.0 LG FL
18951.0 LG FL
18951.0 LG FL
18951.0 LG FL
19599.0 Motorola FL
19599.0 Motorola FL
19599.0 Motorola FL
20180.0 Samsung FL
20180.0 Samsung FL
20180.0 Samsung FL
20180.0 Samsung FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20898.0 Apple FL
20964.0 Apple FL
20984.0 Apple FL
21383.0 LG FL
22502.0 LG FL
22502.0 LG FL
22545.0 Samsung FL
22545.0 Samsung FL
22805.0 LG FL
22805.0 LG FL
23541.0 LG FL
23541.0 LG FL
23688.0 HTC FL
23799.0 HTC FL
23799.0 HTC FL
23799.0 HTC FL
23799.0 HTC FL
24094.0 Samsung FL
24094.0 Samsung FL
24094.0 Samsung FL
24094.0 Samsung FL
24094.0 Samsung FL
24094.0 Samsung FL
24299.0 Samsung FL
24299.0 Samsung FL
24299.0 Samsung FL
24305.0 Motorola FL
24779.0 Samsung FL
24779.0 Samsung FL
24779.0 Samsung FL
25359.0 Apple FL
25703.0 LG FL
25703.0 LG FL
26452.0 Apple FL
26452.0 Apple FL
26494.0 Samsung FL
26494.0 Samsung FL
27143.0 Samsung FL
27143.0 Samsung FL
27143.0 Samsung FL
27143.0 Samsung FL
27143.0 Samsung FL
27143.0 Samsung FL
27143.0 Samsung FL
27143.0 Samsung FL
27326.0 HTC FL
27586.0 LG FL
27586.0 LG FL
27586.0 LG FL
27586.0 LG FL
27586.0 LG FL
27586.0 LG FL
27586.0 LG FL
27586.0 LG FL
27614.0 Apple FL
27614.0 Apple FL
27614.0 Apple FL
27614.0 Apple FL
27614.0 Apple FL
27614.0 Apple FL
27741.0 Apple FL
27741.0 Apple FL
28057.0 Samsung FL
28057.0 Samsung FL
28351.0 Apple FL
28585.0 Samsung FL
28585.0 Samsung FL
28722.0 LG FL
28722.0 LG FL
28722.0 LG FL
28722.0 LG FL
28722.0 LG FL
28722.0 LG FL
28911.0 LG FL
28911.0 LG FL
28934.0 Unknown FL
29106.0 Samsung FL
29106.0 Samsung FL
29106.0 Samsung FL
29106.0 Samsung FL
29106.0 Samsung FL
29206.0 Samsung FL
29206.0 Samsung FL
29244.0 Apple FL
29244.0 Apple FL
29244.0 Apple FL
29244.0 Apple FL
29581.0 HTC FL
29585.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29613.0 Apple FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29745.0 LG FL
29814.0 Samsung FL
29814.0 Samsung FL
29827.0 LG FL
29827.0 LG FL
30123.0 Samsung FL
30123.0 Samsung FL
30123.0 Samsung FL
30123.0 Samsung FL
30462.0 Motorola FL
30462.0 Motorola FL
30462.0 Motorola FL
30556.0 LG FL
30556.0 LG FL
30607.0 Samsung FL
30607.0 Samsung FL
30607.0 Samsung FL
30607.0 Samsung FL
30607.0 Samsung FL
30619.0 Samsung FL
30619.0 Samsung FL
30998.0 Apple FL
31123.0 Samsung FL
31123.0 Samsung FL
31171.0 LG FL
31171.0 LG FL
31222.0 Apple FL
31480.0 Apple FL
31615.0 Motorola FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32053.0 Samsung FL
32185.0 Apple FL
32394.0 LG FL
32394.0 LG FL
32394.0 LG FL
32394.0 LG FL
32394.0 LG FL
32394.0 LG FL
32465.0 Unknown FL
32497.0 Apple FL
32497.0 Apple FL
32497.0 Apple FL
32497.0 Apple FL
32504.0 Samsung FL
32584.0 HTC FL
32821.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
32831.0 Apple FL
33224.0 Samsung FL
34772.0 Samsung FL
34772.0 Samsung FL
35162.0 Apple FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35797.0 Samsung FL
35871.0 LG FL
36105.0 Apple FL
36315.0 Apple FL
36515.0 Samsung FL
36515.0 Samsung FL
36515.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36622.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
36641.0 Samsung FL
37254.0 Samsung FL
37254.0 Samsung FL
37254.0 Samsung FL
37254.0 Samsung FL
37254.0 Samsung FL
37254.0 Samsung FL
37254.0 Samsung FL
37254.0 Samsung FL
37254.0 Samsung FL
37499.0 LG FL
37499.0 LG FL
37499.0 LG FL
37499.0 LG FL
37499.0 LG FL
38136.0 Unknown FL
38136.0 Unknown FL
38136.0 Unknown FL
38136.0 Unknown FL
38136.0 Unknown FL
39026.0 Samsung FL
39026.0 Samsung FL
39099.0 Apple FL
39101.0 Samsung FL
39391.0 Samsung FL
39391.0 Samsung FL
39391.0 Samsung FL
39391.0 Samsung FL
39391.0 Samsung FL
39391.0 Samsung FL
39391.0 Samsung FL
39391.0 Samsung FL
39494.0 Apple FL
39657.0 Apple FL
39657.0 Apple FL
39657.0 Apple FL
39657.0 Apple FL
39657.0 Apple FL
39657.0 Apple FL
39963.0 Apple FL
39963.0 Apple FL
39963.0 Apple FL
39963.0 Apple FL
39963.0 Apple FL
40023.0 Apple FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40082.0 Samsung FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40198.0 Apple FL
40217.0 LG FL
40217.0 LG FL
40217.0 LG FL
40217.0 LG FL
40217.0 LG FL
40217.0 LG FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
40320.0 Apple FL
select clientid, DeviceMake from mobile_sample where Country = 'United States' AND DeviceMake IN ('Apple', 'Samsung', 'LG', 'RIM', 'HTC', 'Motorola');
clientid DeviceMake
8.0 Samsung
23.0 HTC
23.0 HTC
23.0 HTC
28.0 Motorola
28.0 Motorola
28.0 Motorola
28.0 Motorola
28.0 Motorola
28.0 Motorola
30.0 RIM
30.0 RIM
30.0 RIM
43.0 RIM
43.0 RIM
45.0 Samsung
45.0 Samsung
45.0 Samsung
49.0 LG
59.0 LG
59.0 LG
59.0 LG
62.0 LG
62.0 LG
62.0 LG
62.0 LG
67.0 Apple
77.0 LG
77.0 LG
77.0 LG
77.0 LG
77.0 LG
89.0 Samsung
89.0 Samsung
89.0 Samsung
89.0 Samsung
93.0 LG
93.0 LG
93.0 LG
93.0 LG
93.0 LG
93.0 LG
93.0 LG
109.0 RIM
109.0 RIM
114.0 Samsung
114.0 Samsung
114.0 Samsung
114.0 Samsung
141.0 RIM
146.0 Apple
146.0 Apple
156.0 Apple
156.0 Apple
156.0 Apple
156.0 Apple
156.0 Apple
156.0 Apple
181.0 HTC
182.0 LG
182.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
186.0 LG
188.0 Apple
193.0 RIM
193.0 RIM
193.0 RIM
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
212.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
246.0 Samsung
250.0 Apple
253.0 RIM
253.0 RIM
279.0 Apple
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
286.0 Samsung
302.0 Samsung
302.0 Samsung
302.0 Samsung
302.0 Samsung
302.0 Samsung
302.0 Samsung
307.0 Apple
307.0 Apple
307.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
309.0 Apple
320.0 LG
347.0 Apple
350.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
362.0 Apple
382.0 Motorola
382.0 Motorola
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
387.0 Apple
396.0 Apple
396.0 Apple
400.0 Samsung
400.0 Samsung
400.0 Samsung
400.0 Samsung
400.0 Samsung
400.0 Samsung
416.0 Samsung
416.0 Samsung
419.0 Samsung
419.0 Samsung
419.0 Samsung
419.0 Samsung
419.0 Samsung
419.0 Samsung
422.0 Apple
424.0 Apple
438.0 Samsung
438.0 Samsung
462.0 Samsung
462.0 Samsung
462.0 Samsung
462.0 Samsung
462.0 Samsung
462.0 Samsung
462.0 Samsung
470.0 LG
470.0 LG
470.0 LG
472.0 RIM
472.0 RIM
472.0 RIM
488.0 Apple
488.0 Apple
488.0 Apple
488.0 Apple
488.0 Apple
488.0 Apple
506.0 RIM
506.0 RIM
506.0 RIM
509.0 Apple
509.0 Apple
509.0 Apple
515.0 Apple
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
543.0 Samsung
555.0 LG
555.0 LG
573.0 Apple
598.0 Samsung
598.0 Samsung
605.0 Apple
625.0 RIM
629.0 LG
629.0 LG
629.0 LG
629.0 LG
629.0 LG
629.0 LG
648.0 Samsung
669.0 Apple
684.0 LG
684.0 LG
685.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
703.0 Apple
704.0 Apple
704.0 Apple
704.0 Apple
704.0 Apple
704.0 Apple
704.0 Apple
704.0 Apple
704.0 Apple
715.0 Samsung
715.0 Samsung
715.0 Samsung
715.0 Samsung
716.0 Samsung
716.0 Samsung
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
717.0 Apple
721.0 LG
721.0 LG
721.0 LG
743.0 LG
743.0 LG
743.0 LG
755.0 Samsung
755.0 Samsung
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
761.0 LG
763.0 Apple
763.0 Apple
763.0 Apple
763.0 Apple
763.0 Apple
764.0 Apple
778.0 Apple
781.0 Apple
804.0 LG
804.0 LG
804.0 LG
804.0 LG
804.0 LG
804.0 LG
820.0 HTC
820.0 HTC
821.0 Apple
837.0 Apple
845.0 Apple
850.0 Samsung
850.0 Samsung
850.0 Samsung
850.0 Samsung
864.0 RIM
875.0 RIM
900.0 Samsung
900.0 Samsung
900.0 Samsung
900.0 Samsung
900.0 Samsung
900.0 Samsung
900.0 Samsung
900.0 Samsung
927.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
983.0 Apple
986.0 LG
986.0 LG
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
995.0 Apple
999.0 Apple
999.0 Apple
1004.0 Apple
1005.0 LG
1005.0 LG
1005.0 LG
1005.0 LG
1035.0 Samsung
1035.0 Samsung
1035.0 Samsung
1035.0 Samsung
1035.0 Samsung
1035.0 Samsung
1035.0 Samsung
1044.0 LG
1044.0 LG
1044.0 LG
1051.0 Samsung
1051.0 Samsung
1055.0 HTC
1067.0 HTC
1090.0 Samsung
1090.0 Samsung
1090.0 Samsung
1090.0 Samsung
1097.0 Samsung
1101.0 Apple
1122.0 Apple
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1140.0 LG
1149.0 LG
1149.0 LG
1151.0 Apple
1152.0 HTC
1153.0 RIM
1153.0 RIM
1156.0 LG
1156.0 LG
1156.0 LG
1156.0 LG
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1168.0 Samsung
1169.0 Apple
1181.0 Samsung
1181.0 Samsung
1181.0 Samsung
1200.0 Apple
1223.0 LG
1223.0 LG
1223.0 LG
1228.0 HTC
1228.0 HTC
1228.0 HTC
1228.0 HTC
1228.0 HTC
1228.0 HTC
1228.0 HTC
1228.0 HTC
1228.0 HTC
1228.0 HTC
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1231.0 Apple
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1244.0 LG
1245.0 Motorola
1245.0 Motorola
1245.0 Motorola
1271.0 Samsung
1271.0 Samsung
1271.0 Samsung
1271.0 Samsung
1271.0 Samsung
1318.0 Samsung
1318.0 Samsung
1328.0 Apple
1328.0 Apple
1328.0 Apple
1328.0 Apple
1328.0 Apple
1329.0 LG
1329.0 LG
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1371.0 Apple
1372.0 RIM
1372.0 RIM
1372.0 RIM
1377.0 Samsung
1377.0 Samsung
1377.0 Samsung
1385.0 LG
1385.0 LG
1396.0 Apple
1398.0 Apple
1418.0 Apple
1419.0 RIM
1419.0 RIM
1437.0 Apple
1443.0 LG
1451.0 LG
1451.0 LG
1451.0 LG
1451.0 LG
1451.0 LG
1463.0 LG
1463.0 LG
1463.0 LG
1463.0 LG
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1479.0 Apple
1493.0 Apple
1493.0 Apple
1493.0 Apple
1498.0 RIM
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1507.0 LG
1525.0 Samsung
1525.0 Samsung
1525.0 Samsung
1529.0 Apple
1535.0 Apple
1535.0 Apple
1535.0 Apple
1535.0 Apple
1544.0 Apple
1550.0 RIM
1550.0 RIM
1550.0 RIM
1550.0 RIM
1573.0 Samsung
1573.0 Samsung
1578.0 Samsung
1579.0 Apple
1589.0 Apple
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1601.0 Samsung
1617.0 Apple
1622.0 Apple
1632.0 Apple
1632.0 Apple
1632.0 Apple
1656.0 Apple
1656.0 Apple
1656.0 Apple
1661.0 Samsung
1661.0 Samsung
1668.0 Apple
1676.0 LG
1694.0 Motorola
1743.0 Apple
1763.0 LG
1775.0 LG
1775.0 LG
1775.0 LG
1775.0 LG
1775.0 LG
1800.0 LG
1800.0 LG
1800.0 LG
1800.0 LG
1800.0 LG
1800.0 LG
1800.0 LG
1800.0 LG
1857.0 LG
1857.0 LG
1861.0 Apple
1862.0 RIM
1862.0 RIM
1862.0 RIM
1862.0 RIM
1862.0 RIM
1862.0 RIM
1862.0 RIM
1864.0 Apple
1864.0 Apple
1864.0 Apple
1864.0 Apple
1901.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1905.0 Samsung
1918.0 Apple
1935.0 Samsung
1935.0 Samsung
1941.0 HTC
1950.0 LG
1950.0 LG
1950.0 LG
1950.0 LG
1969.0 Apple
1985.0 Apple
1993.0 Samsung
1993.0 Samsung
1996.0 Samsung
1996.0 Samsung
1996.0 Samsung
2008.0 Apple
2008.0 Apple
2008.0 Apple
2008.0 Apple
2008.0 Apple
2013.0 Apple
2013.0 Apple
2013.0 Apple
2040.0 Motorola
2041.0 LG
2047.0 Apple
2047.0 Apple
2047.0 Apple
2048.0 Apple
2048.0 Apple
2048.0 Apple
2053.0 Apple
2053.0 Apple
2053.0 Apple
2053.0 Apple
2053.0 Apple
2053.0 Apple
2053.0 Apple
2053.0 Apple
2053.0 Apple
2068.0 LG
2068.0 LG
2068.0 LG
2068.0 LG
2068.0 LG
2074.0 Apple
2074.0 Apple
2100.0 Apple
2105.0 RIM
2112.0 HTC
2112.0 HTC
2112.0 HTC
2112.0 HTC
2112.0 HTC
2112.0 HTC
2112.0 HTC
2119.0 RIM
2119.0 RIM
2133.0 Apple
2147.0 Samsung
2147.0 Samsung
2171.0 Motorola
2173.0 Samsung
2173.0 Samsung
2173.0 Samsung
2173.0 Samsung
2173.0 Samsung
2173.0 Samsung
2173.0 Samsung
2173.0 Samsung
2178.0 RIM
2185.0 Samsung
2185.0 Samsung
2185.0 Samsung
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2186.0 Apple
2191.0 Samsung
2191.0 Samsung
2191.0 Samsung
2191.0 Samsung

ScaDaMaLe Course site and book

Assignment

Ingest, Explore, Play with this kaggle dataset: - https://www.kaggle.com/marcodena/mobile-phone-activity/version/1

ScaDaMaLe Course site and book

Here are some resources worth looking at:

  • https://databricks.com/session/improving-traffic-prediction-using-weather-data
    • https://www.ibm.com/developerworks/
    • https://www.ibm.com/developerworks/community/blogs/jacquesroy/entry/talking_timeseries2?lang=en
  • https://databricks.com/blog/2017/05/09/detecting-abuse-scale-locality-sensitive-hashing-uber-engineering.html
  • https://dzone.com/articles/implementing-live-weather-reporting-with-hdfhorton
  • https://github.com/twosigma/flint
  • https://databricks.gitbooks.io/databricks-spark-reference-applications/content/timeseries/index.html
  • https://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/
    • https://github.com/sryza/spark-timeseries

ScaDaMaLe Course site and book

Notebooks structure and necessary libraries

Stavroula Rafailia Vlachou (LinkedIn), Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano 
and Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

The common notebooks are:

1. 03301OSMtoGraphXUppsalaTiny: Construction of a road graph from OpenStreetMap (OSM) data with GraphX and finer partitions for a small area in Uppsala.

**2. 03302OSMtoGraphX_LT:** Construction of a road graph corresponding to Lithuania's road network from OSM data with GraphX. Ingestion of OSM data with methods from the osm-parquetizer project; suitable for big data. Further segmentation.

The project's open source code regarding Rafailia's part is structures as follows:

**1. 03401MapMatchingwithGeoMatch_UppsalaTiny:** GeoMatch: Map-matching OSM nodes to OSM ways (showcase)

2. 03402MapMatchingonaGraphUppsalaTiny: GeoMatch: Map-matching OSM nodes to a road graph G0. The latter is constructed by a discretization of the road network provided by OSM.

3. 03403MapMatchingonaGraphLT: GeoMatch: Map-matching events of interest (vehicle collisions) onto Lithuania's road graph G0. Revisit end of the notebook after 034_06SimulatingArrivalTimesNHPP_Inversion for the generation of location for each time variate simulated for the NHPP.

4. 03404MapMatchingonaG1LT: GeoMatch: Map-matching events of interest (vehicle collisions) onto Lithuania's coarsened road graph G1 (under a distance threshold of 100 meters).

**5. 034_05DistributionOfStates:** The conditional/posterior distributions of the states given a time unit and the distribution of the states independent of time.

6. 03406SimulatingArrivalTimesNHPPInversion: Simulation of the arrival times of a NHPP from one or more realisations.

The project's open source code regarding Virginia's part is structured as follows:

1. 03501Arcgiscoordinatestransformation: Transformation of coordinates using Arcgis Runtime library.

2. 03502SegmentationmunicipalitiesMagellan: Magellan: locating the accidents within each municipality.

**3. 03503Visualization_municipalities:** Visualizations of accidents in municipalities using Python.

**4. 03504MapMatching_intersections:** GeoMatch: map-matching accidents with their closest intersection and measuring the distance between them.

5. 03505UndirectedG0: Undirected graph from the topology road graph created using Open Street Maps (OSM) data.

**6. 03506ConnectedComponent_PageRank:** Connected component alogrithm is applied to undirected G0 together with pagerank algorithm.

7. 03507PoissonRegression: Poisson regression on the number of accidents based on different factors.

Maven libraries that need to be installed in the cluster

com.graphhopper:map-matching:0.6.0

io.spray:spray-json_2.11:1.3.4

org.openstreetmap.osmosis:osmosis-osm-binary:0.45

org.openstreetmap.osmosis:osmosis-pbf:0.45

org.openstreetmap.osmosis:osmosis-core:0.45

com.esri.geometry:esri-geometry-api:2.1.0

org.cusp.bdi.gm.GeoMatch

ScaDaMaLe Course site and book

Creating a road graph from OpenStreetMap (OSM) data with GraphX

Stavroula Rafailia Vlachou (LinkedIn), Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano 
and Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

This project builds on top of the work of Dillon George (2016-2018).

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
ls /datasets/osm/uppsala
path name size
dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.node.parquet.crc .uppsalaTinyR.pbf.node.parquet.crc 172.0
dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.relation.parquet.crc .uppsalaTinyR.pbf.relation.parquet.crc 84.0
dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.way.parquet.crc .uppsalaTinyR.pbf.way.parquet.crc 84.0
dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf uppsalaTinyR.pbf 17867.0
dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.node.parquet uppsalaTinyR.pbf.node.parquet 20829.0
dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.relation.parquet uppsalaTinyR.pbf.relation.parquet 9394.0
dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.way.parquet uppsalaTinyR.pbf.way.parquet 9542.0
dbfs:/datasets/osm/uppsala/uppsalaTinyV.osm.pbf uppsalaTinyV.osm.pbf 30606.0
import crosby.binary.osmosis.OsmosisReader

import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path

import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink

import sqlContext.implicits._

import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.graphx._
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.graphx._
val allowableWays = Set(
  "motorway",
  "motorway_link",
  "trunk",
  "trunk_link",
  "primary",
  "primary_link",
  "secondary",
  "secondary_link",
  "tertiary",
  "tertiary_link",
  "living_street",
  "residential",
  "road",
  "construction",
  "motorway_junction"
)
allowableWays: scala.collection.immutable.Set[String] = Set(construction, primary_link, secondary_link, secondary, residential, trunk_link, tertiary_link, motorway_link, motorway, tertiary, road, trunk, living_street, primary, motorway_junction)
val fs = FileSystem.get(new Configuration())
val path = new Path("dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf")
val file = fs.open(path)

var nodes: ArrayBuffer[Node] = ArrayBuffer()
var ways: ArrayBuffer[Way] = ArrayBuffer()
var relations: ArrayBuffer[Relation] = ArrayBuffer()

val osmosisReader = new OsmosisReader(file)
  osmosisReader.setSink(new Sink {
    override def process(entityContainer: EntityContainer): Unit = {
      
      if (entityContainer.getEntity.getType != EntityType.Bound) {
        val entity = entityContainer.getEntity
        entity match {
          case node: Node => nodes += node
          case way: Way => {
            val tagSet = way.getTags.map(_.getValue).toSet
            if ( !(tagSet & allowableWays).isEmpty ) {
              // way has at least one tag of interest
              ways += way
            }
          }
          case relation: Relation => relations += relation
        }
      }
    }

    override def initialize(map: java.util.Map[String, AnyRef]): Unit = {
      nodes = ArrayBuffer()
      ways = ArrayBuffer()
      relations = ArrayBuffer()
    }

    override def complete(): Unit = {}

    override def release(): Unit = {} // this is 4.6 method
    
    def close(): Unit = {}
  })

osmosisReader.run() 
case class WayEntry(wayId: Long, tags: Array[String], nodes: Array[Long])
case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Array[String])
defined class WayEntry
defined class NodeEntry
//convert the nodes array to Dataset
val nodeDS = nodes.map{node => 
  NodeEntry(node.getId, 
       node.getLatitude, 
       node.getLongitude, 
       node.getTags.map(_.getValue).toArray
)}.toDS
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
nodeDS.count()
res2: Long = 627
nodeDS.show(5, false)
+--------+------------------+------------------+----+
|nodeId  |latitude          |longitude         |tags|
+--------+------------------+------------------+----+
|312339  |59.856328500000004|17.6430124        |[]  |
|312352  |59.85636590000001 |17.6478229        |[]  |
|312353  |59.857437700000006|17.645897700000003|[]  |
|312363  |59.857601900000006|17.6432529        |[]  |
|25724030|59.857001200000006|17.6418004        |[]  |
+--------+------------------+------------------+----+
only showing top 5 rows
//convert the ways array to Dataset
val wayDS = ways.map(way => 
  WayEntry(way.getId,
      way.getTags.map(_.getValue).toArray,
      way.getWayNodes.map(_.getNodeId).toArray)
).toDS.cache
wayDS: org.apache.spark.sql.Dataset[WayEntry] = [wayId: bigint, tags: array<string> ... 1 more field]
wayDS.count()
res5: Long = 9
wayDS.show(9, false)
+---------+--------------------------------------------------------------------------+----------------------------------------------+
|wayId    |tags                                                                      |nodes                                         |
+---------+--------------------------------------------------------------------------+----------------------------------------------+
|4281074  |[living_street, Bredgränd, paving_stones]                                 |[25812013]                                    |
|73834008 |[4, secondary, 4, 40, Kungsgatan, asphalt]                                |[25734373, 312352, 3431600977]                |
|263934971|[living_street, 7, Dragarbrunnsgatan, paving_stones, sv:Dragarbrunnsgatan]|[3067700668, 312363]                          |
|263934973|[living_street, 7, Dragarbrunnsgatan, paving_stones, sv:Dragarbrunnsgatan]|[312363, 3067700665, 25735257, 3067700641]    |
|299906437|[4, secondary, 3, 2, 1, 40, Kungsgatan, asphalt]                          |[312353, 801437007, 2187779764, 25734373]     |
|302521477|[residential, Dragarbrunnsgatan, asphalt, sv:Dragarbrunnsgatan]           |[3067700641, 2206536285, 25734470, 2206536278]|
|302521479|[4, secondary, 3, 2, 1, 40, Kungsgatan, asphalt]                          |[455006648]                                   |
|393182257|[living_street, yes, Vretgränd, no, asphalt]                              |[3963994985, 25735257]                        |
|733389337|[4, secondary, 3, 2, 1, 40, Kungsgatan, asphalt]                          |[455006648, 1523899738, 312353]               |
+---------+--------------------------------------------------------------------------+----------------------------------------------+
import org.apache.spark.sql.functions.explode

val nodeCounts = wayDS
                    .select(explode('nodes).as("node"))
                    .groupBy('node).count

nodeCounts.show(5)
+----------+-----+
|      node|count|
+----------+-----+
|    312363|    2|
| 455006648|    2|
|  25812013|    1|
|3067700668|    1|
|  25735257|    2|
+----------+-----+
only showing top 5 rows

import org.apache.spark.sql.functions.explode
nodeCounts: org.apache.spark.sql.DataFrame = [node: bigint, count: bigint]
val intersectionNodes = nodeCounts.filter('count >= 2).select('node.alias("intersectionNode"))
intersectionNodes: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
intersectionNodes.count() //there are 6 intersections in this area 
res10: Long = 6
val true_intersections = intersectionNodes
true_intersections: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
true_intersections.count
res12: Long = 6
intersectionNodes.show()
+----------------+
|intersectionNode|
+----------------+
|          312363|
|       455006648|
|        25735257|
|        25734373|
|          312353|
|      3067700641|
+----------------+
val distinctNodesWays = wayDS.flatMap(_.nodes).distinct //the distinct nodes within the ways 
distinctNodesWays: org.apache.spark.sql.Dataset[Long] = [value: bigint]
distinctNodesWays.printSchema
root
 |-- value: long (nullable = false)
distinctNodesWays.count()
res16: Long = 18
distinctNodesWays.show(5)
+----------+
|     value|
+----------+
|    312363|
| 455006648|
|  25812013|
|3067700668|
|  25735257|
+----------+
only showing top 5 rows
val wayNodes = nodeDS.as("nodes") //nodes that are in a way + nodes info from nodeDS
  .joinWith(distinctNodesWays.as("ways"), $"ways.value" === $"nodes.nodeId")
  .map(_._1).cache
wayNodes: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
wayNodes.printSchema
root
 |-- nodeId: long (nullable = false)
 |-- latitude: double (nullable = false)
 |-- longitude: double (nullable = false)
 |-- tags: array (nullable = true)
 |    |-- element: string (containsNull = true)
wayNodes.count()
res20: Long = 18
wayNodes.show(5, false) //the nodes and their coordinates that participate in the ways 25734373, 312352
+--------+------------------+------------------+----+
|nodeId  |latitude          |longitude         |tags|
+--------+------------------+------------------+----+
|312352  |59.85636590000001 |17.6478229        |[]  |
|312353  |59.857437700000006|17.645897700000003|[]  |
|312363  |59.857601900000006|17.6432529        |[]  |
|25734373|59.8567674        |17.6471041        |[]  |
|25734470|59.8562881        |17.6456634        |[]  |
+--------+------------------+------------------+----+
only showing top 5 rows
wayDS.printSchema
root
 |-- wayId: long (nullable = false)
 |-- tags: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- nodes: array (nullable = true)
 |    |-- element: long (containsNull = false)
val intersectionSetVal = intersectionNodes.as[Long].collect.toSet; //turn intersectionNodes to Set 
intersectionSetVal: scala.collection.immutable.Set[Long] = Set(3067700641, 312363, 455006648, 312353, 25735257, 25734373)
//new 
import org.apache.spark.sql.functions.{collect_list, map, udf}
import org.apache.spark.sql.functions._

// You could try using `getItem` methods
// I assume that each "nodes" sequence contains at least one node
// We do not really need first and last elements from the sequence and when combining with original nodes, just we assign them "true"

val remove_first_and_last = udf((x: Seq[Long]) => x.drop(1).dropRight(1))

val nodes = wayDS.
  select($"wayId", $"nodes").
  withColumn("node", explode($"nodes")).
  drop("nodes")

val get_first_and_last = udf((x: Seq[Long]) => {val first = x(0); val last = x.reverse(0); Array(first, last)})

val first_and_last_nodes = wayDS.
  select($"wayId", get_first_and_last($"nodes").as("nodes")).
  withColumn("node", explode($"nodes")).
  drop("nodes")

val fake_intersections = first_and_last_nodes.select($"node").distinct().withColumnRenamed("node", "value")

// // Turn intersection set into a dataset to join (all values must be unique)
// //val intersections = intersectionSetVal.toSeq.toDF("value")
val intersections = intersectionNodes.union(fake_intersections).distinct      //virginia
 
val wayNodesLocated = nodes.join(wayNodes, wayNodes.col("nodeId") === nodes.col("node")).select($"wayId", $"node", $"latitude", $"longitude")


// case class MappedWay(wayId: Long, labels: Seq[Map[Long, Boolean]])
case class MappedWay(wayId: Long, labels_located: Seq[Map[Long, (Boolean, Double, Double)]])


val maps = wayNodesLocated.join(intersections, 'node === 'intersectionNode, "left_outer").
  //left outer joins returns all rows from the left DataFrame/Dataset regardless of match found on the right dataset
    select($"wayId", $"node", $"intersectionNode".isNotNull.as("contains"), $"latitude", $"longitude").
   groupBy("wayId").agg(collect_list(map($"node", struct($"contains".as("_1"), $"latitude".as("_2"), $"longitude".as("_3")))).as("labels_located")).as[MappedWay] 
 

val combine = udf((nodes: Seq[Long], labels_located: Seq[scala.collection.immutable.Map[Long, (Boolean, Double, Double)]]) => {
  // If labels does not have "node", then it is either start/end - we assign label = true, latitude = 0, longitude = 0 for it, TO DO: revise it later, not sure
  val m = labels_located.map(_.toSeq).flatten.toMap

  nodes.map { node => (node, m.getOrElse(node, (true, 0D, 0D))) } //add structure

})


val strSchema = "array<struct<nodeId:long, nodeInfo:struct<label:boolean, latitude:double, longitude: double>>>"
val labeledWays = wayDS.join(maps, "wayId")
                     .select($"wayId", $"tags", combine($"nodes", $"labels_located").as("labeledNodes").cast(strSchema))
import org.apache.spark.sql.functions.{collect_list, map, udf}
import org.apache.spark.sql.functions._
remove_first_and_last: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
get_first_and_last: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
first_and_last_nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
fake_intersections: org.apache.spark.sql.DataFrame = [value: bigint]
intersections: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [intersectionNode: bigint]
wayNodesLocated: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint ... 2 more fields]
defined class MappedWay
maps: org.apache.spark.sql.Dataset[MappedWay] = [wayId: bigint, labels_located: array<map<bigint,struct<_1:boolean,_2:double,_3:double>>>]
combine: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,ArrayType(StructType(StructField(_1,LongType,false), StructField(_2,StructType(StructField(_1,BooleanType,false), StructField(_2,DoubleType,false), StructField(_3,DoubleType,false)),true)),true),Some(List(ArrayType(LongType,false), ArrayType(MapType(LongType,StructType(StructField(_1,BooleanType,false), StructField(_2,DoubleType,false), StructField(_3,DoubleType,false)),true),true))))
strSchema: String = array<struct<nodeId:long, nodeInfo:struct<label:boolean, latitude:double, longitude: double>>>
labeledWays: org.apache.spark.sql.DataFrame = [wayId: bigint, tags: array<string> ... 1 more field]
labeledWays.printSchema
root
 |-- wayId: long (nullable = false)
 |-- tags: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- labeledNodes: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- nodeId: long (nullable = true)
 |    |    |-- nodeInfo: struct (nullable = true)
 |    |    |    |-- label: boolean (nullable = true)
 |    |    |    |-- latitude: double (nullable = true)
 |    |    |    |-- longitude: double (nullable = true)
labeledWays.select("wayId", "labeledNodes").show(9, false)
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|wayId    |labeledNodes                                                                                                                                                                                                   |
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|393182257|[[3963994985, [true, 59.857381800000006, 17.645299100000003]], [25735257, [true, 59.8569759, 17.644382]]]                                                                                                      |
|733389337|[[455006648, [true, 59.857930700000004, 17.6450031]], [1523899738, [false, 59.8575528, 17.645685500000003]], [312353, [true, 59.857437700000006, 17.645897700000003]]]                                         |
|299906437|[[312353, [true, 59.857437700000006, 17.645897700000003]], [801437007, [false, 59.8571596, 17.6463952]], [2187779764, [false, 59.856883200000006, 17.6468947]], [25734373, [true, 59.8567674, 17.6471041]]]    |
|263934973|[[312363, [true, 59.857601900000006, 17.6432529]], [3067700665, [false, 59.8575443, 17.6433633]], [25735257, [true, 59.8569759, 17.644382]], [3067700641, [true, 59.856720800000005, 17.6448606]]]             |
|73834008 |[[25734373, [true, 59.8567674, 17.6471041]], [312352, [false, 59.85636590000001, 17.6478229]], [3431600977, [true, 59.85631480000001, 17.6479153]]]                                                            |
|302521479|[[455006648, [true, 59.857930700000004, 17.6450031]]]                                                                                                                                                          |
|302521477|[[3067700641, [true, 59.856720800000005, 17.6448606]], [2206536285, [false, 59.8563708, 17.645517400000003]], [25734470, [false, 59.8562881, 17.6456634]], [2206536278, [true, 59.85618040000001, 17.6458707]]]|
|263934971|[[3067700668, [true, 59.857640200000006, 17.6431843]], [312363, [true, 59.857601900000006, 17.6432529]]]                                                                                                       |
|4281074  |[[25812013, [true, 59.8578769, 17.641676]]]                                                                                                                                                                    |
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
case class Intersection(OSMId: Long , latitude: Double, longitude: Double, inBuf: ArrayBuffer[(Long, Double, Double)], outBuf: ArrayBuffer[(Long, Double, Double)])
defined class Intersection
val segmentedWays = labeledWays.map(way => {
  
  val labeledNodes = way.getAs[Seq[Row]]("labeledNodes").map{case Row(k: Long, Row(v: Boolean, w:Double, x:Double)) => (k, v,w,x)}.toSeq //labeledNodes: (nodeid, label, lat, long)
  val wayId = way.getAs[Long]("wayId")
  
  val indexedNodes: Seq[((Long, Boolean, Double, Double), Int)] = labeledNodes.zipWithIndex //appends an integer as an index to every labeledNodes in a way
  
  val intersections = ArrayBuffer[Intersection]()  
  
  val currentBuffer = ArrayBuffer[(Long, Double, Double)]()
  
  val way_length = labeledNodes.length //number of nodes in a way
  
  if (way_length == 1) {

    val intersect = new Intersection(labeledNodes(0)._1, labeledNodes(0)._3, labeledNodes(0)._4, ArrayBuffer((-1L, 0D, 0D)), ArrayBuffer((-1L, 0D, 0D))) //include lat and long info

    var result = Array((intersect.OSMId, intersect.latitude, intersect.longitude, intersect.inBuf.toArray, intersect.outBuf.toArray))
    (wayId, result) //return
  }
  else {
    indexedNodes.foreach{ case ((id, isIntersection, latitude, longitude), i) => // id is nodeId and isIntersection is the node label
      if (isIntersection) {
        val newEntry = new Intersection(id, latitude, longitude, currentBuffer.clone, ArrayBuffer[(Long, Double, Double)]())
        intersections += newEntry
        currentBuffer.clear
      }
      else {
        currentBuffer ++= Array((id, latitude, longitude))  //if the node is not an intersection append the nodeId to the current buffer 
      }
      
      // Reaches the end of the way while the outBuffer is not empty
      // Append the currentBuffer to the last intersection
      if (i == way_length - 1 && !currentBuffer.isEmpty) {  
        if (intersections.isEmpty){
        //intersections += new Intersection(-1L, 0D, 0D, ArrayBuffer[(Long, Double, Double)](), currentBuffer) //not sure about this but I'll keep it by now
        intersections += new Intersection(-1, 0D, 0D, currentBuffer, ArrayBuffer[(Long, Double, Double)]()) 
        }
        else {
          intersections.last.outBuf ++= currentBuffer
        }
        currentBuffer.clear
      }
    }
    var result = intersections.map(i => (i.OSMId, i.latitude, i.longitude, i.inBuf.toArray, i.outBuf.toArray)).toArray  
    (wayId, result) 
  }
})

//segmentedWays contains two columns:
  //_1: wayId
  //_2: Array[(nodeId, latitude, longitude, inBuff, outBuff)] for each intersection node in the way
segmentedWays: org.apache.spark.sql.Dataset[(Long, Array[(Long, Double, Double, Array[(Long, Double, Double)], Array[(Long, Double, Double)])])] = [_1: bigint, _2: array<struct<_1:bigint,_2:double,_3:double,_4:array<struct<_1:bigint,_2:double,_3:double>>,_5:array<struct<_1:bigint,_2:double,_3:double>>>>]
val schema = "array<struct<nodeId:bigint,latitude:double,longitude:double,inBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>,outBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>>>"
segmentedWays.select($"_1".alias("wayId"), $"_2".cast(schema).alias("nodeInfo")).printSchema()
root
 |-- wayId: long (nullable = false)
 |-- nodeInfo: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- nodeId: long (nullable = true)
 |    |    |-- latitude: double (nullable = true)
 |    |    |-- longitude: double (nullable = true)
 |    |    |-- inBuff: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- nodeId: long (nullable = true)
 |    |    |    |    |-- latitude: double (nullable = true)
 |    |    |    |    |-- longitude: double (nullable = true)
 |    |    |-- outBuff: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- nodeId: long (nullable = true)
 |    |    |    |    |-- latitude: double (nullable = true)
 |    |    |    |    |-- longitude: double (nullable = true)

schema: String = array<struct<nodeId:bigint,latitude:double,longitude:double,inBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>,outBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>>>
segmentedWays.show(2, false)
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
|_1       |_2                                                                                                                                                         |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
|393182257|[[3963994985, 59.857381800000006, 17.645299100000003, [], []], [25735257, 59.8569759, 17.644382, [], []]]                                                  |
|733389337|[[455006648, 59.857930700000004, 17.6450031, [], []], [312353, 59.857437700000006, 17.645897700000003, [[1523899738, 59.8575528, 17.645685500000003]], []]]|
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
only showing top 2 rows
//The nested structure of the segmentedWays is unwrapped
val waySegmentDS = segmentedWays
.flatMap(way => way._2.map(node => (way._1, node))) 
// for each (wayId, Array(IntersectionNode) => (wayId, IntersectionNode)
waySegmentDS: org.apache.spark.sql.Dataset[(Long, (Long, Double, Double, Array[(Long, Double, Double)], Array[(Long, Double, Double)]))] = [_1: bigint, _2: struct<_1: bigint, _2: double ... 3 more fields>]
waySegmentDS.printSchema
root
 |-- _1: long (nullable = false)
 |-- _2: struct (nullable = true)
 |    |-- _1: long (nullable = false)
 |    |-- _2: double (nullable = false)
 |    |-- _3: double (nullable = false)
 |    |-- _4: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- _1: long (nullable = false)
 |    |    |    |-- _2: double (nullable = false)
 |    |    |    |-- _3: double (nullable = false)
 |    |-- _5: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- _1: long (nullable = false)
 |    |    |    |-- _2: double (nullable = false)
 |    |    |    |-- _3: double (nullable = false)
waySegmentDS.show(5, false)
+---------+----------------------------------------------------------------------------------------------------+
|_1       |_2                                                                                                  |
+---------+----------------------------------------------------------------------------------------------------+
|393182257|[3963994985, 59.857381800000006, 17.645299100000003, [], []]                                        |
|393182257|[25735257, 59.8569759, 17.644382, [], []]                                                           |
|733389337|[455006648, 59.857930700000004, 17.6450031, [], []]                                                 |
|733389337|[312353, 59.857437700000006, 17.645897700000003, [[1523899738, 59.8575528, 17.645685500000003]], []]|
|299906437|[312353, 59.857437700000006, 17.645897700000003, [], []]                                            |
+---------+----------------------------------------------------------------------------------------------------+
only showing top 5 rows
import scala.collection.immutable.Map
import scala.collection.immutable.Map
//returns the intersection nodes with the ways where they appear mapped with the nodes in those ways (inBuff, outBuff) 
val intersectionVertices = waySegmentDS
  .map(way => 
   //nodeId     latitude   longitude      wayId      inBuff      outBuff
   (way._2._1, (way._2._2, way._2._3, Map(way._1 -> (way._2._4, way._2._5))))) 
  .rdd
  //                     latitude, long, Map(wayId, inBuff, outBuff)
  .reduceByKey((a,b) => (a._1,     a._2, a._3 ++ b._3)) 

//intersectionVertices =  RDD[(nodeId, (latitude, longitude, wayMap(wayId -> inBuff, outBuff)))]
intersectionVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = ShuffledRDD[259] at reduceByKey at command-588572986432353:8
intersectionVertices.map(vertex => (vertex._1, vertex._2._1, vertex._2._2)).toDF("vertexId", "latitude", "longitude").write.mode("overwrite").parquet("dbfs:/graphs/uppsala/vertices")
intersectionVertices.count()
res32: Long = 11
intersectionVertices.take(10)
res33: Array[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((25812013,(59.8578769,17.641676,Map(4281074 -> (Array((-1,0.0,0.0)),Array((-1,0.0,0.0)))))), (455006648,(59.857930700000004,17.6450031,Map(733389337 -> (Array(),Array()), 302521479 -> (Array((-1,0.0,0.0)),Array((-1,0.0,0.0)))))), (25735257,(59.8569759,17.644382,Map(393182257 -> (Array(),Array()), 263934973 -> (Array((3067700665,59.8575443,17.6433633)),Array())))), (3431600977,(59.85631480000001,17.6479153,Map(73834008 -> (Array((312352,59.85636590000001,17.6478229)),Array())))), (3963994985,(59.857381800000006,17.645299100000003,Map(393182257 -> (Array(),Array())))), (3067700641,(59.856720800000005,17.6448606,Map(263934973 -> (Array(),Array()), 302521477 -> (Array(),Array())))), (312353,(59.857437700000006,17.645897700000003,Map(733389337 -> (Array((1523899738,59.8575528,17.645685500000003)),Array()), 299906437 -> (Array(),Array())))), (312363,(59.857601900000006,17.6432529,Map(263934973 -> (Array(),Array()), 263934971 -> (Array(),Array())))), (3067700668,(59.857640200000006,17.6431843,Map(263934971 -> (Array(),Array())))), (25734373,(59.8567674,17.6471041,Map(299906437 -> (Array((801437007,59.8571596,17.6463952), (2187779764,59.856883200000006,17.6468947)),Array()), 73834008 -> (Array(),Array())))))
val edges = segmentedWays
  .filter(way => way._2.length > 1) //ways with more than one intersections
  .flatMap{ case (wayId, nodes_info) => {  
             nodes_info.sliding(2) // For each way it takes nodes in pairs
               .flatMap(segment => //segment is the pair of two nodes
                   List(Edge(segment(0)._1, segment(1)._1, wayId))
               )
   }}
edges: org.apache.spark.sql.Dataset[org.apache.spark.graphx.Edge[Long]] = [srcId: bigint, dstId: bigint ... 1 more field]
edges.map(edge => (edge.srcId, edge.dstId)).toDF("src","dst").write.mode("overwrite").parquet("dbfs:/graphs/uppsala/edges")
edges.printSchema
root
 |-- srcId: long (nullable = false)
 |-- dstId: long (nullable = false)
 |-- attr: long (nullable = false)
edges.count
res35: Long = 8
val roadGraph = Graph(intersectionVertices, edges.rdd).cache

//intersectionVertices =  RDD[(nodeId, (latitude, longitude, wayMap(wayId -> inBuff, outBuff)))]
//edges = srcId, dstId, attribute (attribute is the wayId)
roadGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),Long] = org.apache.spark.graphx.impl.GraphImpl@4114626
roadGraph.edges.take(10).foreach(println)
Edge(3963994985,25735257,393182257)
Edge(455006648,312353,733389337)
Edge(312353,25734373,299906437)
Edge(312363,25735257,263934973)
Edge(25735257,3067700641,263934973)
Edge(25734373,3431600977,73834008)
Edge(3067700641,2206536278,302521477)
Edge(3067700668,312363,263934971)
package d3
// We use a package object so that we can define top level classes like Edge that need to be used in other cells
// This was modified by Ivan Sadikov to make sure it is compatible the latest databricks notebook

import org.apache.spark.sql._
import com.databricks.backend.daemon.driver.EnhancedRDDFunctions.displayHTML

case class Edge(src: String, dest: String, count: Long)

case class Node(name: String)
case class Link(source: Int, target: Int, value: Long)
case class Graph(nodes: Seq[Node], links: Seq[Link])

object graphs {
// val sqlContext = SQLContext.getOrCreate(org.apache.spark.SparkContext.getOrCreate())  /// fix
val sqlContext = SparkSession.builder().getOrCreate().sqlContext
import sqlContext.implicits._
  
def force(clicks: Dataset[Edge], height: Int = 100, width: Int = 960): Unit = {
  val data = clicks.collect()
  val nodes = (data.map(_.src) ++ data.map(_.dest)).map(_.replaceAll("_", " ")).toSet.toSeq.map(Node)
  val links = data.map { t =>
    Link(nodes.indexWhere(_.name == t.src.replaceAll("_", " ")), nodes.indexWhere(_.name == t.dest.replaceAll("_", " ")), t.count / 20 + 1)
  }
  showGraph(height, width, Seq(Graph(nodes, links)).toDF().toJSON.first())
}

/**
 * Displays a force directed graph using d3
 * input: {"nodes": [{"name": "..."}], "links": [{"source": 1, "target": 2, "value": 0}]}
 */
def showGraph(height: Int, width: Int, graph: String): Unit = {

displayHTML(s"""
<style>

.node_circle {
  stroke: #777;
  stroke-width: 1.3px;
}

.node_label {
  pointer-events: none;
}

.link {
  stroke: #777;
  stroke-opacity: .2;
}

.node_count {
  stroke: #777;
  stroke-width: 1.0px;
  fill: #999;
}

text.legend {
  font-family: Verdana;
  font-size: 13px;
  fill: #000;
}

.node text {
  font-family: "Helvetica Neue","Helvetica","Arial",sans-serif;
  font-size: 17px;
  font-weight: 200;
}

</style>

<div id="clicks-graph">
<script src="//d3js.org/d3.v3.min.js"></script>
<script>

var graph = $graph;

var width = $width,
    height = $height;

var color = d3.scale.category20();

var force = d3.layout.force()
    .charge(-700)
    .linkDistance(180)
    .size([width, height]);

var svg = d3.select("#clicks-graph").append("svg")
    .attr("width", width)
    .attr("height", height);
    
force
    .nodes(graph.nodes)
    .links(graph.links)
    .start();

var link = svg.selectAll(".link")
    .data(graph.links)
    .enter().append("line")
    .attr("class", "link")
    .style("stroke-width", function(d) { return Math.sqrt(d.value); });

var node = svg.selectAll(".node")
    .data(graph.nodes)
    .enter().append("g")
    .attr("class", "node")
    .call(force.drag);

node.append("circle")
    .attr("r", 10)
    .style("fill", function (d) {
    if (d.name.startsWith("other")) { return color(1); } else { return color(2); };
})

node.append("text")
      .attr("dx", 10)
      .attr("dy", ".35em")
      .text(function(d) { return d.name });
      
//Now we are giving the SVGs co-ordinates - the force layout is generating the co-ordinates which this code is using to update the attributes of the SVG elements
force.on("tick", function () {
    link.attr("x1", function (d) {
        return d.source.x;
    })
        .attr("y1", function (d) {
        return d.source.y;
    })
        .attr("x2", function (d) {
        return d.target.x;
    })
        .attr("y2", function (d) {
        return d.target.y;
    });
    d3.selectAll("circle").attr("cx", function (d) {
        return d.x;
    })
        .attr("cy", function (d) {
        return d.y;
    });
    d3.selectAll("text").attr("x", function (d) {
        return d.x;
    })
        .attr("y", function (d) {
        return d.y;
    });
});
</script>
</div>
""")
}
  
  def help() = {
displayHTML("""
<p>
Produces a force-directed graph given a collection of edges of the following form:</br>
<tt><font color="#a71d5d">case class</font> <font color="#795da3">Edge</font>(<font color="#ed6a43">src</font>: <font color="#a71d5d">String</font>, <font color="#ed6a43">dest</font>: <font color="#a71d5d">String</font>, <font color="#ed6a43">count</font>: <font color="#a71d5d">Long</font>)</tt>
</p>
<p>Usage:<br/>
<tt><font color="#a71d5d">import</font> <font color="#ed6a43">d3._</font></tt><br/>
<tt><font color="#795da3">graphs.force</font>(</br>
&nbsp;&nbsp;<font color="#ed6a43">height</font> = <font color="#795da3">500</font>,<br/>
&nbsp;&nbsp;<font color="#ed6a43">width</font> = <font color="#795da3">500</font>,<br/>
&nbsp;&nbsp;<font color="#ed6a43">clicks</font>: <font color="#795da3">Dataset</font>[<font color="#795da3">Edge</font>])</tt>
</p>""")
  }
}
Warning: classes defined within packages cannot be redefined without a cluster restart.
Compilation successful.
import d3._
import org.apache.spark.sql.functions.lit
val G0 = roadGraph.edges.toDF().select($"srcId".as("src"), $"dstId".as("dest"),  lit(1L).as("count"))

d3.graphs.force(
  height = 800,
  width = 800,
  clicks = G0.as[d3.Edge])

import com.esri.core.geometry.GeometryEngine.geodesicDistanceOnWGS84
import com.esri.core.geometry.Point
import com.esri.core.geometry.GeometryEngine.geodesicDistanceOnWGS84
import com.esri.core.geometry.Point
val weightedRoadGraph = roadGraph.mapTriplets{triplet => //mapTriplets gives EdgeTriplet https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/graphx/EdgeTriplet.html
  def dist(lat1: Double, long1: Double, lat2: Double, long2: Double): Double = {
    val p1 = new Point(long1, lat1)
    val p2 = new Point(long2, lat2)
    geodesicDistanceOnWGS84(p1, p2)
  }
  
  //A triplet represents an edge along with the vertex attributes of its neighboring vertices (srcAttr, dstAttr)
  //triplet.attr is the same as edge.attr
  val wayNodesInBuff = triplet.dstAttr._3(triplet.attr)._1 //dstAttr is the vertex attribute (latitude, longitude, wayMap(wayId -> inBuff, outBuff))
  // inBuff -> array(nodeId, lat, long)
  
  if (wayNodesInBuff.isEmpty) {
      (triplet.attr, dist(triplet.srcAttr._1, triplet.srcAttr._2, triplet.dstAttr._1, triplet.dstAttr._2))
  
  } else {
      var distance: Double = 0.0

      //adds the distance between the src node and the first node in the InBuff
      distance += dist(triplet.srcAttr._1, triplet.srcAttr._2, wayNodesInBuff(0)._2, wayNodesInBuff(0)._3 )
    
     //more than one node in the inBuffer
      if (wayNodesInBuff.length > 1) {
        //adds the distance between every pair of nodes inside the inBuffer 
        distance += wayNodesInBuff.sliding(2).map{
        buff => dist(buff(0)._2, buff(0)._3, buff(1)._2, buff(1)._3)}
        .reduce(_ + _)
     }
    
      //adds the distance between the dst node and the last node in the InBuff
      distance += dist(wayNodesInBuff.last._2, wayNodesInBuff.last._3, triplet.dstAttr._1, triplet.dstAttr._2)

      (triplet.attr, distance)
    }
  
}.cache
weightedRoadGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),(Long, Double)] = org.apache.spark.graphx.impl.GraphImpl@9eb578f
weightedRoadGraph.edges.count()
res36: Long = 8
weightedRoadGraph.edges.take(8).foreach(println)
Edge(3963994985,25735257,(393182257,68.4570414333903))
Edge(455006648,312353,(733389337,74.36517408391025))
Edge(312353,25734373,(299906437,100.7353398484194))
Edge(312363,25735257,(263934973,94.17321564547117))
Edge(25735257,3067700641,(263934973,39.0782384063323))
Edge(25734373,3431600977,(73834008,67.891710670905))
Edge(3067700641,2206536278,(302521477,82.6456450149808))
Edge(3067700668,312363,(263934971,5.743347106374985))
weightedRoadGraph.vertices.count()
res38: Long = 11
weightedRoadGraph.vertices.map(node => node._1).take(11)
res39: Array[org.apache.spark.graphx.VertexId] = Array(25812013, 455006648, 25735257, 3431600977, 3963994985, 3067700641, 312353, 312363, 3067700668, 25734373, 2206536278)
weightedRoadGraph.vertices.take(11)
res40: Array[(org.apache.spark.graphx.VertexId, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((25812013,(59.8578769,17.641676,Map(4281074 -> (Array((-1,0.0,0.0)),Array((-1,0.0,0.0)))))), (455006648,(59.857930700000004,17.6450031,Map(733389337 -> (Array(),Array()), 302521479 -> (Array((-1,0.0,0.0)),Array((-1,0.0,0.0)))))), (25735257,(59.8569759,17.644382,Map(393182257 -> (Array(),Array()), 263934973 -> (Array((3067700665,59.8575443,17.6433633)),Array())))), (3431600977,(59.85631480000001,17.6479153,Map(73834008 -> (Array((312352,59.85636590000001,17.6478229)),Array())))), (3963994985,(59.857381800000006,17.645299100000003,Map(393182257 -> (Array(),Array())))), (3067700641,(59.856720800000005,17.6448606,Map(263934973 -> (Array(),Array()), 302521477 -> (Array(),Array())))), (312353,(59.857437700000006,17.645897700000003,Map(733389337 -> (Array((1523899738,59.8575528,17.645685500000003)),Array()), 299906437 -> (Array(),Array())))), (312363,(59.857601900000006,17.6432529,Map(263934973 -> (Array(),Array()), 263934971 -> (Array(),Array())))), (3067700668,(59.857640200000006,17.6431843,Map(263934971 -> (Array(),Array())))), (25734373,(59.8567674,17.6471041,Map(299906437 -> (Array((801437007,59.8571596,17.6463952), (2187779764,59.856883200000006,17.6468947)),Array()), 73834008 -> (Array(),Array())))), (2206536278,(59.85618040000001,17.6458707,Map(302521477 -> (Array((2206536285,59.8563708,17.645517400000003), (25734470,59.8562881,17.6456634)),Array())))))
import org.apache.spark.graphx.{Edge => Edges}
val splittedEdges = weightedRoadGraph.triplets.flatMap{triplet => {
  def dist(lat1: Double, long1: Double, lat2: Double, long2: Double): Double = {
    val p1 = new Point(long1, lat1)
    val p2 = new Point(long2, lat2)
    geodesicDistanceOnWGS84(p1, p2)
  }
  val maxDist = 200
  var finalResult = Array[(Edges[(Long,  Double)], (Long, (Double, Double, Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])), (Long, (Double, Double, Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])))]()
  
  if(triplet.attr._2 > maxDist){                            
    val wayId = triplet.attr._1
    var wayNodesBuff = triplet.dstAttr._3(wayId)._1 
    var wayNodesBuffSize = wayNodesBuff.length
    
    if(wayNodesBuffSize > 0){
      var previousSrc = triplet.srcId

      var distance: Double = 0.0
      var currentBuff = Array[(Long, Double, Double)]()
      
      distance += dist(triplet.srcAttr._1, triplet.srcAttr._2, wayNodesBuff(0)._2, wayNodesBuff(0)._3) 
      
      var newVertex = (triplet.srcId, triplet.srcAttr)
      var previousVertex = newVertex
      
      if (distance > maxDist){
        newVertex = (wayNodesBuff(0)._1, (wayNodesBuff(0)._2, wayNodesBuff(0)._3, Map(wayId -> (Array[(Long, Double, Double)](), Array[(Long, Double, Double)]()))))
            
        finalResult +:= (Edges(previousSrc, wayNodesBuff(0)._1, (wayId, distance)), previousVertex, newVertex) 
        
        previousVertex = newVertex
        
        distance = 0
        previousSrc = wayNodesBuff(0)._1
      }
      else 
      {
        currentBuff +:= wayNodesBuff(0)
      }
         
      //loop through pairs of nodes in the way (in the buffer)
      if (wayNodesBuff.length > 1){
      wayNodesBuff.sliding(2).foreach{segment => {
        
        val tmp_dst = distance
        distance += dist(segment(0)._2, segment(0)._3, segment(1)._2, segment(1)._3)
        
        if (distance > maxDist)
        {
          if(segment(0)._1 != previousSrc){
              //      Vertex(nodeId,      (lat,                long,     Map(wayId->inBuff, outBuff)))
            newVertex = (segment(0)._1, (segment(0)._2, segment(0)._3, Map(wayId -> (currentBuff, Array[(Long, Double, Double)]()))) )

            //adds the edge to the array
            finalResult +:= (Edges(previousSrc, segment(0)._1, (wayId, tmp_dst)), previousVertex, newVertex)

            previousVertex = newVertex
            distance -= tmp_dst
            previousSrc = segment(0)._1
            currentBuff = Array[(Long, Double, Double)]()
          }    
        }
        else 
        {
          currentBuff +:= segment(0)
        }
      }}}
      
      
      //from last node in the inBuff to the dst
      val tmp_dist = distance
      distance += dist(wayNodesBuff.last._2, wayNodesBuff.last._3, triplet.dstAttr._1, triplet.dstAttr._2)
      if (distance > maxDist){
        if (wayNodesBuff.last._1 != previousSrc){
            newVertex = (wayNodesBuff.last._1, (wayNodesBuff.last._2, wayNodesBuff.last._3, Map(wayId -> (currentBuff, Array[(Long, Double, Double)]()))))
            finalResult +:= (Edges(previousSrc, wayNodesBuff.last._1, (wayId, tmp_dist)), previousVertex, newVertex) 
            previousVertex = newVertex
            distance -= tmp_dist
            previousSrc = wayNodesBuff.last._1 
            currentBuff = Array[(Long, Double, Double)]()
            newVertex = (triplet.dstId, (triplet.dstAttr._1, triplet.dstAttr._2, Map(wayId -> (currentBuff, triplet.dstAttr._3(wayId)._2))) )
        }
      }
      finalResult +:= (Edges(previousSrc, triplet.dstId, (wayId, distance)), previousVertex, newVertex)
      
    }
    // Distance > threshold but no nodes in the way (buffer)
    else
    {
      finalResult +:= (Edges(triplet.srcId, triplet.dstId, triplet.attr), (triplet.srcId, triplet.srcAttr), (triplet.dstId, triplet.dstAttr))
    }
  }
  // Distance < threshold
  else
  {
    finalResult +:= (Edges(triplet.srcId, triplet.dstId, triplet.attr), (triplet.srcId, triplet.srcAttr), (triplet.dstId, triplet.dstAttr))
  }
  
  // return
  finalResult
}}
import org.apache.spark.graphx.{Edge=>Edges}
splittedEdges: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.Edge[(Long, Double)], (Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])), (Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])))] = MapPartitionsRDD[776] at flatMap at command-588572986432369:2
// Taking each edge and its reverse
val segmentedEdges = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => Array(edge) ++ Array(Edges(edge.dstId, edge.srcId, edge.attr))}
segmentedEdges.count() 
segmentedEdges: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[(Long, Double)]] = MapPartitionsRDD[777] at flatMap at command-588572986432370:2
res70: Long = 16
segmentedEdges.take(36).foreach(println)
Edge(3963994985,25735257,(393182257,68.4570414333903))
Edge(25735257,3963994985,(393182257,68.4570414333903))
Edge(455006648,312353,(733389337,74.36517408391025))
Edge(312353,455006648,(733389337,74.36517408391025))
Edge(2187779764,25734373,(299906437,17.439956081003103))
Edge(25734373,2187779764,(299906437,17.439956081003103))
Edge(312353,2187779764,(299906437,83.2953837674163))
Edge(2187779764,312353,(299906437,83.2953837674163))
Edge(312363,25735257,(263934973,94.17321564547117))
Edge(25735257,312363,(263934973,94.17321564547117))
Edge(25735257,3067700641,(263934973,39.0782384063323))
Edge(3067700641,25735257,(263934973,39.0782384063323))
Edge(25734373,3431600977,(73834008,67.891710670905))
Edge(3431600977,25734373,(73834008,67.891710670905))
Edge(3067700641,2206536278,(302521477,82.6456450149808))
Edge(2206536278,3067700641,(302521477,82.6456450149808))
Edge(3067700668,312363,(263934971,5.743347106374985))
Edge(312363,3067700668,(263934971,5.743347106374985))
// Taking the individual vertices
val segmentedVertices = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => Array(srcVertex) ++ Array(dstVertex)}

segmentedVertices.map(node => node._1).distinct().take(16)
//25812013, 455006648, 25735257, 3431600977, 3963994985, 3067700641, 312353, 312363, 3067700668, 25734373, 2206536278) initial nodes 
segmentedVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[778] at flatMap at command-588572986432372:2
res71: Array[Long] = Array(455006648, 25735257, 3431600977, 3963994985, 3067700641, 312353, 312363, 3067700668, 25734373, 2206536278)
// Converting the vertices to a df
val verticesDF = segmentedVertices.toDF("nodeId","attr").select($"nodeId",$"attr._1".as("lat"),$"attr._2".as("long"),explode($"attr._3"))
    .withColumnRenamed("key","wayId").withColumnRenamed("value","buffers")
    .select($"nodeId",$"lat",$"long",$"wayId",$"buffers._1".as("inBuff"),$"buffers._2".as("outBuff"))
  
verticesDF.show(24,false)
+----------+------------------+------------------+---------+-----------------------------------------------------------------------------------+----------------+
|nodeId    |lat               |long              |wayId    |inBuff                                                                             |outBuff         |
+----------+------------------+------------------+---------+-----------------------------------------------------------------------------------+----------------+
|3963994985|59.857381800000006|17.645299100000003|393182257|[]                                                                                 |[]              |
|25735257  |59.8569759        |17.644382         |393182257|[]                                                                                 |[]              |
|25735257  |59.8569759        |17.644382         |263934973|[[3067700665, 59.8575443, 17.6433633]]                                             |[]              |
|455006648 |59.857930700000004|17.6450031        |733389337|[]                                                                                 |[]              |
|455006648 |59.857930700000004|17.6450031        |302521479|[[-1, 0.0, 0.0]]                                                                   |[[-1, 0.0, 0.0]]|
|312353    |59.857437700000006|17.645897700000003|733389337|[[1523899738, 59.8575528, 17.645685500000003]]                                     |[]              |
|312353    |59.857437700000006|17.645897700000003|299906437|[]                                                                                 |[]              |
|312353    |59.857437700000006|17.645897700000003|733389337|[[1523899738, 59.8575528, 17.645685500000003]]                                     |[]              |
|312353    |59.857437700000006|17.645897700000003|299906437|[]                                                                                 |[]              |
|25734373  |59.8567674        |17.6471041        |299906437|[[801437007, 59.8571596, 17.6463952], [2187779764, 59.856883200000006, 17.6468947]]|[]              |
|25734373  |59.8567674        |17.6471041        |73834008 |[]                                                                                 |[]              |
|312363    |59.857601900000006|17.6432529        |263934973|[]                                                                                 |[]              |
|312363    |59.857601900000006|17.6432529        |263934971|[]                                                                                 |[]              |
|25735257  |59.8569759        |17.644382         |393182257|[]                                                                                 |[]              |
|25735257  |59.8569759        |17.644382         |263934973|[[3067700665, 59.8575443, 17.6433633]]                                             |[]              |
|25735257  |59.8569759        |17.644382         |393182257|[]                                                                                 |[]              |
|25735257  |59.8569759        |17.644382         |263934973|[[3067700665, 59.8575443, 17.6433633]]                                             |[]              |
|3067700641|59.856720800000005|17.6448606        |263934973|[]                                                                                 |[]              |
|3067700641|59.856720800000005|17.6448606        |302521477|[]                                                                                 |[]              |
|25734373  |59.8567674        |17.6471041        |299906437|[[801437007, 59.8571596, 17.6463952], [2187779764, 59.856883200000006, 17.6468947]]|[]              |
|25734373  |59.8567674        |17.6471041        |73834008 |[]                                                                                 |[]              |
|3431600977|59.85631480000001 |17.6479153        |73834008 |[[312352, 59.85636590000001, 17.6478229]]                                          |[]              |
|3067700641|59.856720800000005|17.6448606        |263934973|[]                                                                                 |[]              |
|3067700641|59.856720800000005|17.6448606        |302521477|[]                                                                                 |[]              |
+----------+------------------+------------------+---------+-----------------------------------------------------------------------------------+----------------+
only showing top 24 rows

verticesDF: org.apache.spark.sql.DataFrame = [nodeId: bigint, lat: double ... 4 more fields]
//unique wayIds of the edges
val nodesWayId = splittedEdges.map{case(edge, srcVertex, dstVertex) => edge.attr._1}.toDF("nodesWayId").dropDuplicates() 
nodesWayId.show(10)
+----------+
|nodesWayId|
+----------+
| 393182257|
| 733389337|
| 299906437|
| 263934973|
|  73834008|
| 302521477|
| 263934971|
+----------+

nodesWayId: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nodesWayId: bigint]
// Only vertices which have a wayId in their Map that is not included in any edge
// Dead end means there are no other intersection vertex in the way
val verticesWithDeadEndWays = verticesDF.join(nodesWayId, $"nodesWayId" === $"wayId", "leftanti") //leftanti is a special join which returns the rows that don't match
verticesWithDeadEndWays.show(20,false)
+---------+------------------+----------+---------+----------------+----------------+
|nodeId   |lat               |long      |wayId    |inBuff          |outBuff         |
+---------+------------------+----------+---------+----------------+----------------+
|455006648|59.857930700000004|17.6450031|302521479|[[-1, 0.0, 0.0]]|[[-1, 0.0, 0.0]]|
+---------+------------------+----------+---------+----------------+----------------+

verticesWithDeadEndWays: org.apache.spark.sql.DataFrame = [nodeId: bigint, lat: double ... 4 more fields]
//convert df to rdd to be joined later with the rest of the vertices
import scala.collection.mutable.WrappedArray
val verticesWithDeadEndWaysRDD = verticesWithDeadEndWays.rdd.map(row => (row.getLong(0),(row.getDouble(1),row.getDouble(2),Map(row.getLong(3)-> (row.getAs[WrappedArray[(Long, Double, Double)]](4).array,row.getAs[WrappedArray[(Long, Double, Double)]](5).array)))))

verticesWithDeadEndWaysRDD.take(10)
import scala.collection.mutable.WrappedArray
verticesWithDeadEndWaysRDD: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[820] at map at command-588572986432376:3
res80: Array[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((455006648,(59.857930700000004,17.6450031,Map(302521479 -> (Array([-1,0.0,0.0]),Array([-1,0.0,0.0]))))))
// for a node appearing in different ways, returns one vertex for each way
val verticesWithSharedWays = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => 
  {
    val srcVertex1 = (srcVertex._1,(srcVertex._2._1,srcVertex._2._2,Map(edge.attr._1 -> srcVertex._2._3(edge.attr._1))))
    val dstVertex1 = (dstVertex._1,(dstVertex._2._1,dstVertex._2._2,Map(edge.attr._1 -> dstVertex._2._3(edge.attr._1))))

    Array(srcVertex1) ++ Array(dstVertex1)
  }}.distinct()


verticesWithSharedWays.take(10)
verticesWithSharedWays: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[824] at distinct at command-588572986432377:8
res81: Array[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((25735257,(59.8569759,17.644382,Map(393182257 -> (Array(),Array())))), (312363,(59.857601900000006,17.6432529,Map(263934971 -> (Array(),Array())))), (312353,(59.857437700000006,17.645897700000003,Map(299906437 -> (Array(),Array())))), (3963994985,(59.857381800000006,17.645299100000003,Map(393182257 -> (Array(),Array())))), (25735257,(59.8569759,17.644382,Map(263934973 -> (Array((3067700665,59.8575443,17.6433633)),Array())))), (3431600977,(59.85631480000001,17.6479153,Map(73834008 -> (Array((312352,59.85636590000001,17.6478229)),Array())))), (3067700641,(59.856720800000005,17.6448606,Map(302521477 -> (Array(),Array())))), (25734373,(59.8567674,17.6471041,Map(73834008 -> (Array(),Array())))), (2206536278,(59.85618040000001,17.6458707,Map(302521477 -> (Array((2206536285,59.8563708,17.645517400000003), (25734470,59.8562881,17.6456634)),Array())))), (3067700668,(59.857640200000006,17.6431843,Map(263934971 -> (Array(),Array())))))
//union of verticesWithDeadEndWaysRDD and verticesWithSharedWays and reduced adding the maps 
val allVertices = verticesWithSharedWays.union(verticesWithDeadEndWaysRDD).reduceByKey((a,b) => (a._1, a._2, a._3 ++ b._3)) 
allVertices.count()
allVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = ShuffledRDD[826] at reduceByKey at command-588572986432378:2
res82: Long = 10
import org.apache.spark.graphx.Graph
val segmentedGraph = Graph(allVertices, segmentedEdges).cache()
import org.apache.spark.graphx.Graph
segmentedGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),(Long, Double)] = org.apache.spark.graphx.impl.GraphImpl@2cb44552
//allVertices.map(vertex => (vertex._1,(vertex._2._1, vertex._2._2))).toDF("id","coordinates").write.mode("overwrite").parquet("dbfs:/graphs/uppsala/vertices")
// spark.read.parquet("dbfs:/graphs/uppsala/edges").rdd.take(1)
res88: Array[org.apache.spark.sql.Row] = Array([2187779764,25734373,[299906437,17.439956081003103]])
segmentedGraph.vertices.take(11) 
res33: Array[(org.apache.spark.graphx.VertexId, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((25735257,(59.8569759,17.644382,Map(263934973 -> (Array((3067700665,59.8575443,17.6433633)),Array()), 393182257 -> (Array(),Array())))), (2187779764,(59.856883200000006,17.6468947,Map(299906437 -> (Array((801437007,59.8571596,17.6463952), (801437007,59.8571596,17.6463952)),Array())))), (3431600977,(59.85631480000001,17.6479153,Map(73834008 -> (Array((312352,59.85636590000001,17.6478229)),Array())))), (3963994985,(59.857381800000006,17.645299100000003,Map(393182257 -> (Array(),Array())))), (3067700641,(59.856720800000005,17.6448606,Map(263934973 -> (Array(),Array()), 302521477 -> (Array(),Array())))), (3067700668,(59.857640200000006,17.6431843,Map(263934971 -> (Array(),Array())))), (2206536278,(59.85618040000001,17.6458707,Map(302521477 -> (Array((2206536285,59.8563708,17.645517400000003), (25734470,59.8562881,17.6456634)),Array())))), (455006648,(59.857930700000004,17.6450031,Map(733389337 -> (Array(),Array()), 302521479 -> (Array([-1,0.0,0.0]),Array([-1,0.0,0.0]))))), (312353,(59.857437700000006,17.645897700000003,Map(733389337 -> (Array((1523899738,59.8575528,17.645685500000003)),Array()), 299906437 -> (Array(),Array())))), (312363,(59.857601900000006,17.6432529,Map(263934973 -> (Array(),Array()), 263934971 -> (Array(),Array())))), (25734373,(59.8567674,17.6471041,Map(73834008 -> (Array(),Array()), 299906437 -> (Array(),Array())))))
segmentedGraph.edges.count
res34: Long = 18
segmentedGraph.edges.take(18).foreach(println)
Edge(25735257,3963994985,(393182257,68.4570414333903))
Edge(3963994985,25735257,(393182257,68.4570414333903))
Edge(312353,455006648,(733389337,74.36517408391025))
Edge(455006648,312353,(733389337,74.36517408391025))
Edge(312353,2187779764,(299906437,83.2953837674163))
Edge(25734373,2187779764,(299906437,17.439956081003103))
Edge(2187779764,312353,(299906437,83.2953837674163))
Edge(2187779764,25734373,(299906437,17.439956081003103))
Edge(312363,25735257,(263934973,94.17321564547117))
Edge(25735257,312363,(263934973,94.17321564547117))
Edge(25735257,3067700641,(263934973,39.0782384063323))
Edge(3067700641,25735257,(263934973,39.0782384063323))
Edge(25734373,3431600977,(73834008,67.891710670905))
Edge(3431600977,25734373,(73834008,67.891710670905))
Edge(2206536278,3067700641,(302521477,82.6456450149808))
Edge(3067700641,2206536278,(302521477,82.6456450149808))
Edge(312363,3067700668,(263934971,5.743347106374985))
Edge(3067700668,312363,(263934971,5.743347106374985))
val G1 = segmentedGraph.edges.toDF().select($"srcId".as("src"), $"dstId".as("dest"), lit(1L).as("count"))

d3.graphs.force(
  height = 1000,
  width = 1000,
  clicks = G1.as[d3.Edge])

ScaDaMaLe Course site and book

Creating a road graph from OpenStreetMap (OSM) data with GraphX

Stavroula Rafailia Vlachou (LinkedIn), Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano 
and Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

This project builds on top of the work of Dillon George (2016-2018).

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Download the road network representation of Lithuania through OSM data distributed from GeoFabrik https://download.geofabrik.de/europe/lithuania.html

curl -O https://download.geofabrik.de/europe/lithuania-latest.osm.pbf
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0  155M    0  512k    0     0   906k      0  0:02:55 --:--:--  0:02:55  906k
 19  155M   19 30.4M    0     0  19.5M      0  0:00:07  0:00:01  0:00:06 19.5M
 41  155M   41 64.1M    0     0  25.0M      0  0:00:06  0:00:02  0:00:04 25.0M
 65  155M   65  101M    0     0  28.5M      0  0:00:05  0:00:03  0:00:02 28.5M
 92  155M   92  143M    0     0  31.5M      0  0:00:04  0:00:04 --:--:-- 31.4M
100  155M  100  155M    0     0  32.1M      0  0:00:04  0:00:04 --:--:-- 36.3M
dbutils.fs.mv("file:/databricks/driver/lithuania-latest.osm.pbf", "dbfs:/datasets/osm/lithuania/lithuania.osm.pbf")
res6: Boolean = true
import crosby.binary.osmosis.OsmosisReader

import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path

import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink

import sqlContext.implicits._

import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._

import org.apache.spark.sql.functions._

import org.apache.spark.graphx._
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions._
import org.apache.spark.graphx._
  • For the ingestion of the entire OSM Lithuanian road network dataset, the PBF file obtained from OSM is transformed to three parquet files; one for each primitive (nodes, ways and relations), by utilising methods of the osm-parquetizer project. The first two generated files, corresponding to the nodes and ways are then transferred into the distributed file system for further exploitation.

Install the osm-parquetizer in the cluster

Clone the repository from osm-parquetizer project and build the library that will be updated to the cluster.

Later, it will be used to load the osm data faster.

//Run this command only once per cluster 
%sh 
java -jar /dbfs/FileStore/jars/2706d711_3963_4d88_92e7_a8870d0164d1-osm_parquetizer_1_0_1_SNAPSHOT-80d25.jar /dbfs/datasets/osm/lithuania/lithuania.osm.pbf
2022-04-06 07:40:54 INFO  CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-06 07:40:55 INFO  CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-06 07:40:55 INFO  CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-06 07:40:58 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 1000000
2022-04-06 07:40:59 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 2000000
2022-04-06 07:41:00 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 3000000
2022-04-06 07:41:02 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 4000000
2022-04-06 07:41:03 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 5000000
2022-04-06 07:41:04 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 6000000
2022-04-06 07:41:11 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 7000000
2022-04-06 07:41:12 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 8000000
2022-04-06 07:41:13 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 9000000
2022-04-06 07:41:14 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 10000000
2022-04-06 07:41:15 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 11000000
2022-04-06 07:41:16 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 12000000
2022-04-06 07:41:23 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 13000000
2022-04-06 07:41:24 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 14000000
2022-04-06 07:41:25 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 15000000
2022-04-06 07:41:25 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 16000000
2022-04-06 07:41:27 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 17000000
2022-04-06 07:41:28 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 18000000
2022-04-06 07:41:29 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 19000000
2022-04-06 07:41:36 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 20000000
2022-04-06 07:41:37 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 21000000
2022-04-06 07:41:43 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 22000000
2022-04-06 07:41:48 INFO  App$MultiEntitySinkObserver:118 - Entities processed: 23000000
2022-04-06 07:42:01 INFO  App$MultiEntitySinkObserver:125 - Total entities processed: 23727209
ls /dbfs/datasets/osm/lithuania/
lithuania.osm.pbf
lithuania.osm.pbf.node.parquet
lithuania.osm.pbf.relation.parquet
lithuania.osm.pbf.way.parquet

Read the parquet files of the nodes and ways obtained from the osm-parquetizer.

spark.conf.set("spark.sql.parquet.binaryAsString", true)

val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
val ways_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.way.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
ways_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 6 more fields]
  • The list of tags chosen for this work. For the semantic meaning of each tag see the OSM description. The list is non exhaustive and should be adapted according to the desired granulatiry of and level of detail of the project at hand.
val allowableWays = Seq(
  "motorway",
  "motorway_link",
  "trunk",
  "trunk_link",
  "primary",
  "primary_link",
  "secondary",
  "secondary_link",
  "tertiary",
  "tertiary_link",
  "living_street",
  "residential",
  "road",
  "construction",
  "motorway_junction"
)
allowableWays: Seq[String] = List(motorway, motorway_link, trunk, trunk_link, primary, primary_link, secondary, secondary_link, tertiary, tertiary_link, living_street, residential, road, construction, motorway_junction)
//convert the nodes to Dataset containing the fields of interest

case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])

val nodeDS = nodes_df.map(node => 
  NodeEntry(node.getAs[Long]("id"),
       node.getAs[Double]("latitude"),
       node.getAs[Double]("longitude"),
       node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
)).cache()
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
nodeDS.count()
res2: Long = 21212155
//convert the ways to Dataset containing the fields of interest

case class WayEntry(wayId: Long, tags: Array[String], nodes: Array[Long])

val wayDS = ways_df.flatMap(way => {
        val tagSet = way.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) =>  value}.toArray
        if (tagSet.intersect(allowableWays).nonEmpty ){
            Array(WayEntry(way.getAs[Long]("id"),
            tagSet,
            way.getAs[Seq[Row]]("nodes").map{case Row(index:Integer, nodeId:Long) =>  nodeId}.toArray
            ))
        }
        else { Array[WayEntry]()}
}
).cache()
defined class WayEntry
wayDS: org.apache.spark.sql.Dataset[WayEntry] = [wayId: bigint, tags: array<string> ... 1 more field]
wayDS.count()
res4: Long = 137540
val nodeCounts = wayDS
                    .select(explode('nodes).as("node"))
                    .groupBy('node).count
nodeCounts: org.apache.spark.sql.DataFrame = [node: bigint, count: bigint]
  • An intersection node is defined here as a node that lies in at least two ways.
val intersectionNodes = nodeCounts.filter('count >= 2).select('node.alias("intersectionNode"))
val true_intersections = intersectionNodes
intersectionNodes: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
true_intersections: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
intersectionNodes.count()
res8: Long = 162325
val distinctNodesWays = wayDS.flatMap(_.nodes).distinct //the distinct nodes within the ways 
distinctNodesWays: org.apache.spark.sql.Dataset[Long] = [value: bigint]
distinctNodesWays.count()
res10: Long = 1299907
val wayNodes = nodeDS.as("nodes") 
  .joinWith(distinctNodesWays.as("ways"), $"ways.value" === $"nodes.nodeId")
  .map(_._1).cache
wayNodes: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
wayNodes.count()
res12: Long = 1299907
val intersectionSetVal = intersectionNodes.as[Long].collect.toSet; //turn intersectionNodes to Set 
intersectionSetVal: scala.collection.immutable.Set[Long] = Set(3954894392, 1028098141, 8327933356, 1192596601, 1036402120, 5840172474, 691993192, 7280204168, 3837546128, 1509692779, 3774745375, 2888929887, 3882298102, 4456063981, 1812836277, 6219174203, 1132762870, 2704534617, 1036358572, 1314515551, 5887601785, 3472814007, 935011580, 2266417234, 2218477159, 3830971192, 3758026612, 2628269378, 2450295578, 2036730950, 4014928315, 4047561472, 3742211751, 417473667, 710972352, 1240304711, 2344640802, 3175136574, 3610788315, 1152426347, 3843702680, 2135301596, 3463371091, 2578259945, 2272646209, 9288252126, 8659906497, 5046236674, 3882606462, 6853150636, 2348202899, 1827020895, 1034351953, 2872587837, 7598921441, 4441135707, 7154408678, 2143313902, 6358524504, 1827841626, 51401434, 2104687370, 3169288908, 2203661858, 509277213, 7398865298, 2706131803, 7020673974, 2482655992, 410873070, 40599892, 2718581564, 1136446055, 2612123258, 5856761891, 896143820, 1723158680, 3692175721, 7973969958, 2596488268, 2746044544, 1145714624, 1057404723, 412963083, 81203920, 1258193303, 7277561125, 5215875721, 9119375173, 2095081588, 1017873867, 1151243019, 1848119391, 1924034959, 277888047, 3124645299, 3796300978, 34825612, 2234211037, 2378775918, 2533534558, 3387536812, 262278719, 3539046584, 2600271017, 2343507669, 6198589614, 798855679, 2955786090, 31452099, 2255897649, 4069602981, 5821435649, 8510338674, 9118314553, 727235490, 1632026970, 1138846538, 7817950226, 9500011302, 2491378894, 2659296775, 6510669593, 2245343559, 1549190307, 4723634502, 5975664368, 834528382, 1144264294, 3398600261, 2934302676, 2620066696, 2512528358, 1026705634, 3846262838, 4944746627, 4475382645, 2045995025, 2043449667, 2800589982, 6562241076, 6466080351, 1639532336, 8006034862, 4572083119, 4102652469, 2135301330, 1848200775, 2725675664, 321982547, 2379130060, 9236572852, 1834676531, 7342960659, 3481040164, 3773275282, 4723676068, 4508131633, 4426839331, 2419523846, 7279732924, 1156860008, 3591788118, 1946671545, 1636124896, 3492717581, 4949411117, 1044390922, 6845662470, 371663507, 3385128084, 5962770335, 1242544881, 457526430, 981428783, 4961114540, 2262631506, 2297448445, 9603285842, 1474540484, 5940837836, 1700351712, 2320635099, 2146958637, 9270519896, 4872990619, 2928092139, 4425066655, 2206664581, 7280235988, 1535287197, 1183618876, 6485160551, 4411398966, 3991141456, 1628728176, 4889396284, 4759399991, 3946596452, 2229195631, 6327513918, 2033129308, 2585464548, 1800850400, 1104097855, 1801836841, 5543206580, 1733360477, 2192183580, 1286185458, 1039741678, 3071904626, 1479129180, 8159625958, 8153576878, 137356770, 9512132640, 2889406310, 9282951374, 2772945378, 7087700888, 299690975, 7496589529, 363422207, 1258118530, 3717631194, 2769080208, 873542203, 6203485294, 2213415589, 1826893692, 1295860361, 1991899638, 7142870533, 2636482814, 2228819220, 2486409125, 1408251311, 1163019147, 1986264613, 722809386, 1032791795, 2372253123, 7280236132, 3012402670, 410410926, 1632013830, 4185213749, 8422801035, 5082283666, 8390975598, 717553500, 8903760367, 1426181723, 2486831606, 2706132069, 7082082762, 3629722595, 1827841682, 1663312625, 1116078354, 809918023, 1634421784, 2294078407, 6371632243, 6942832214, 3103437026, 4778848191, 5609948419, 928079952, 2643946553, 2219142846, 3259798128, 972705422, 4532894711, 1147389163, 4200665722, 1621450716, 3446435974, 3508982006, 2914931552, 267993651, 8609633816, 1583332389, 8437833869, 3954868130, 2844494775, 307436405, 9537310383, 1011788287, 1218743199, 289957295, 1751074538, 1156335745, 1146179866, 3281587820, 8742132605, 7194943159, 730037695, 2210300341, 4067081255, 2120972958, 2431147071, 1822938498, 6538086425, 4002602910, 5353103914, 4983226891, 4213376931, 7637618131, 289974999, 897017041, 3784378983, 863076967, 2548257942, 664083103, 2107860391, 1933944950, 316984109, 7234677785, 2078466041, 1643397002, 1947951854, 1022668729, 3923653099, 430592144, 9526786026, 1647829111, 4350345841, 8299422853, 1388346027, 1333297066, 5713163211, 1610127502, 1788952448, 2458903268, 3791809345, 948084432, 1316591108, 31451994, 3143625714, 2760666691, 3828309401, 2597263264, 3266940454, 1316448033, 440186291, 8180441284, 420327660, 509915470, 2514797704, 3780311361, 8559264289, 3394551775, 3212336035, 6679942853, 4709809401, 3653215774, 1583370181, 987421444, 1833211352, 1144264023, 6630983072, 3378700352, 59966924, 2349179912, 7262854702, 8467921475, 3440462099, 2575696642, 3026960820, 1218743216, 3027688464, 2229173598, 3410180667, 6556729270, 2291514844, 3593937592, 258334236, 3014185666, 4413684252, 3110697588, 2241312126, 3730438791, 5936924029, 3751286821, 9026315361, 2338538874, 3060553050, 2141652614, 5875853605, 3784808168, 9465276921, 2403703365, 1583370191, 7077003935, 1026388850, 7194944311, 5859419005, 2473521017, 3842318629, 3376486271, 802900320, 8252803189, 2208380506, 1156333576, 4483834287, 270417965, 7237069225, 360358257, 2210300378, 2420411103, 4535542163, 6216923884, 2051653884, 2636348593, 1472252644, 4993989546, 4109866667, 828100084, 1765999049, 2718581586, 8301377775, 1698483524, 7363586219, 2218608237, 2291119222, 3167440934, 2651981227, 1584052082, 4054735777, 309903458, 8603152470, 9476909282, 1409939961, 4301868799, 59597223, 2189861315, 267993634, 2929972394, 9225032695, 135506977, 2798742686, 975863053, 5311900085, 3488545429, 875843248, 2183169598, 34831827, 1838097699, 9374557330, 2977367244, 9431428140, 4843262174, 2338503812, 8865734959, 417459235, 3721512253, 2782161816, 1991225594, 4458953260, 5801770199, 3267338755, 4020271732, 2221734300, 2527141662, 7182319546, 1184532267, 3731435296, 1492744502, 1798375955, 3077206333, 519329015, 3729938977, 1682401466, 2291847466, 6964491980, 297195057, 3917165782, 5941024645, 3739153439, 3001058738, 5958222327, 8314763501, 1066074385, 316413715, 2278156350, 5347857056, 1666690939, 4183327417, 3856244360, 2064604400, 4250281533, 2060142472, 2403235372, 32841598, 2386188237, 1097991668, 2183690668, 2630516839, 928079854, 6065705344, 307761081, 6353759494, 5353548326, 3404472949, 5781659914, 9498221622, 1425162913, 3577210444, 7279644065, 2225499506, 983975927, 9425700741, 1784462203, 1518862773, 5731650357, 460421744, 1057745021, 1869057080, 3835591790, 9556653974, 3287551559, 1388523717, 4895686278, 1669536064, 6679519540, 863767755, 279031554, 1675549394, 2218477211, 3733218337, 33696568, 2304694881, 1628815710, 2865789132, 3784627841, 2291591139, 4428630254, 1399469264, 82665220, 1184706242, 1132762659, 2249967109, 2206365893, 2132763286, 4011368562, 1376095268, 2204877189, 4548183278, 5496029012, 1869298008, 1362530886, 5743781594, 983988377, 9376200974, 1406174749, 7924188545, 7057373803, 5109656203, 6565466018, 6176044211, 3012400575, 4937071805, 2883700859, 2599495143, 2717130199, 3766909365, 5713162376, 1194658790, 5737707544, 307434796, 1232671903, 2636315241, 8259520715, 2305314281, 1622596513, 387188500, 5810398683, 3013782338, 8542643608, 1136279036, 3001000032, 1414401838, 1679739780, 196452763, 2184428808, 8480819385, 2162630967, 5737707289, 833173029, 9594321488, 1283445383, 918958651, 875887014, 957483875, 2598507082, 2098286970, 1834140639, 3394367897, 270955266, 2396052669, 2635681054, 2557177227, 738262392, 3852199229, 7253699761, 4014087118, 4772352886, 1317359171, 2457760635, 2081277045, 477279781, 6565459647, 994420612, 4103058760, 2206318153, 8609775508, 1408174608, 853085910, 1201326444, 969236676, 2234945974, 3385153797, 8807447960, 2667148223, 1717012137, 2392972139, 7179029668, 8354869319, 2571067773, 938485933, 6737257768, 5650317935, 1800873402, 5737707311, 6860929497, 2107860313, 3747571216, 3274638558, 882543449, 8148693091, 428894557, 33351960, 4834093689, 4319213787, 2199942142, 956228765, 4843283834, 774449468, 2278776107, 8942677479, 9245138140, 4382734481, 1582432158, 1369282096, 983978646, 1073909045, 1834040946, 759821178, 2609163849, 1119737116, 2354970285, 3387522495, 2272646056, 1026388570, 5958221291, 845238880, 5827167555, 4781018322, 7296351534, 2882366814, 8408590758, 2720703388, 877483437, 1786405258, 3307278453, 3751405818, 7106436389, 2148903330, 5494802007, 3555422221, 60178273, 1099711886, 2585151355, 3974900096, 8428736128, 7954360226, 7598764758, 4375255572, 1240307170, 2205035462, 3714158040, 1316590749, 3001059117, 1562239376, 7176474600, 2352440636, 692905899, 1020477653, 4503154089, 5018686059, 1146179488, 6205024586, 2372252431, 1572488857, 468212277, 32444415, 841684817, 4994858487, 2484703166, 60735101, 416868242, 1473000644, 1106088085, 1833573595, 2108526263, 6794671031, 2331881331, 1304629131, 3583899803, 2314813365, 417459748, 307436400, 130184820, 1478353614, 6150026276, 5854494346, 613391647, 1789002993, 2093484483, 3883005032, 3651396832, 672383763, 7661640286, 2206307106, 672882902, 672383810, 2794473788, 628916719, 3351527927, 363415410, 2447375338, 1045091692, 1071180106, 2206664533, 509915492, 774102472, 2427890952, 1985569687, 1043674973, 2622769778, 1028097980, 7672655414, 903875660, 448106871, 1242375740, 3406764205, 1426181269, 2047701609, 2327214518, 32137407, 2565898533, 1378715716, 2391162479, 7254849968, 3105466444, 334074652, 3766872158, 3394225486, 7220822425, 1825537562, 2213482293, 213103643, 3762598178, 2299315684, 4149587762, 1190924291, 3771852909, 2249019343, 2234997832, 1234754014, 1436522954, 3539132652, 8777124688, 444382405, 3991092613, 5849683848, 3458040974, 1640691492, 1305960244, 8108544369, 2578259841, 6231031628, 3097531642, 4015318844, 1367988073, 2394951419, 984017310, 1584852514, 8486499516, 1430930928, 1425162944, 4292888171, 1821448220, 4007267980, 5132586877, 7065792490, 32070346, 1625011580, 1639391465, 2379921223, 2926434071, 1156355235, 1874799141, 1229897058, 2277634548, 2206390743, 5075625954, 3659517817, 2216244891, 1747606437, 669898636, 6298083892, 1962417346, 2156378184, 7255498135, 9309540804, 1156337096, 3783267692, 2706118851, 2218679650, 2457780505, 2436488867, 9248641052, 4419542059, 2579616352, 8655173923, 3784465264, 3660955552, 967978729, 6718424372, 1599651948, 3076644627, 894241402, 2405003030, 2425004096, 838667615, 747170701, 2364173809, 509915338, 2079707910, 1367982120, 2592464646, 423134683, 7704615140, 360346943, 1093565436, 2346919436, 8078819601, 8479967689, 1139668748, 2486831545, 1323305098, 8974061782, 3255347567, 7695291822, 834596921, 6221589470, 3086066032, 364862229, 4014880532, 3385093676, 4147558019, 32325860, 59602095, 8914071661, 9205403028, 1091178299, 1640421992, 3235859832, 8553407645, 2088182463, 1066074463, 1144263463, 9399410933, 3614818103, 1037021065, 2339350243, 2234992547, 482372352, 8365607345, 9407899787, 2503238675, 4052869743, 4057398263, 2752337390, 6022389760, 3757229740, 3664167280, 1109786253, 9420175990, 1639229904, 4936629565, 7194943988, 948085015, 1859172543, 1037238437, 1475777575, 2378760997, 2202910372, 2250873936, 4059537745, 34823041, 833756825, 926087368, 2301065976, 8664365690, 5733289457, 2225499406, 316030135, 4485856605, 4183327389, 39720895, 7762319766, 1786533553, 3773213186, 4096000552, 3583815488, 8904198808, 4002388889, 1827843225, 1151245225, 1633730199, 1711891073, 1818530578, 4015319010, 1043674554, 9277109190, 1305078052, 1070210060, 993790515, 2293770972, 3075151948, 3001199427, 87871140, 8221996479, 1811038357, 8628592241, 3173095658, 905162047, 2163102239, 1068906801, 4428831607, 2245344052, 2299428801, 5466134779, 2044841002, 5037480685, 3783505652, 1955251123, 1568102895, 4100065051, 3923639220, 285961845, 3256252817, 5102966903, 2225499382, 2535189315, 1082645551, 2723635886, 2266375430, 707871678, 2519920004, 475246698, 9172574938, 1399469459, 1344873524, 5707764259, 2683347689, 1642173525, 846815313, 4934403402, 1153785449, 2771898228, 7248424922, 1566236272, 912155364, 7923015520, 1069856530, 1332023401, 5739790119, 1771110833, 367155036, 1250349086, 3456840005, 9213970172, 1072204527, 1421435056, 270904459)
import org.apache.spark.sql.functions.{collect_list, map, udf}
import org.apache.spark.sql.functions._

val remove_first_and_last = udf((x: Seq[Long]) => x.drop(1).dropRight(1))

val nodes = wayDS.
  select($"wayId", $"nodes").
  withColumn("node", explode($"nodes")).
  drop("nodes")

val get_first_and_last = udf((x: Seq[Long]) => {val first = x(0); val last = x.reverse(0); Array(first, last)})

val first_and_last_nodes = wayDS.
  select($"wayId", get_first_and_last($"nodes").as("nodes")).
  withColumn("node", explode($"nodes")).
  drop("nodes")

val dead_end_points = first_and_last_nodes.select($"node").distinct().withColumnRenamed("node", "value")

// Turn intersection set into a dataset to join (all values must be unique)
val intersections = intersectionNodes.union(dead_end_points).distinct      
 
val wayNodesLocated = nodes.join(wayNodes, wayNodes.col("nodeId") === nodes.col("node")).select($"wayId", $"node", $"latitude", $"longitude")


case class MappedWay(wayId: Long, labels_located: Seq[Map[Long, (Boolean, Double, Double)]])


val maps = wayNodesLocated.join(intersections, 'node === 'intersectionNode, "left_outer").
  //left outer joins returns all rows from the left DataFrame/Dataset regardless of match found on the right dataset
    select($"wayId", $"node", $"intersectionNode".isNotNull.as("contains"), $"latitude", $"longitude").
   groupBy("wayId").agg(collect_list(map($"node", struct($"contains".as("_1"), $"latitude".as("_2"), $"longitude".as("_3")))).as("labels_located")).as[MappedWay] 
 

val combine = udf((nodes: Seq[Long], labels_located: Seq[scala.collection.immutable.Map[Long, (Boolean, Double, Double)]]) => {
  // If labels does not have "node", then it is either start/end - we assign label = true, latitude = 0, longitude = 0 for it, TO DO: revise it later, not sure
  val m = labels_located.map(_.toSeq).flatten.toMap

  nodes.map { node => (node, m.getOrElse(node, (true, 0D, 0D))) } //add structure

})


val strSchema = "array<struct<nodeId:long, nodeInfo:struct<label:boolean, latitude:double, longitude: double>>>"
val labeledWays = wayDS.join(maps, "wayId")
                     .select($"wayId", $"tags", combine($"nodes", $"labels_located").as("labeledNodes").cast(strSchema))
import org.apache.spark.sql.functions.{collect_list, map, udf}
import org.apache.spark.sql.functions._
remove_first_and_last: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
get_first_and_last: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
first_and_last_nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
dead_end_points: org.apache.spark.sql.DataFrame = [value: bigint]
intersections: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [intersectionNode: bigint]
wayNodesLocated: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint ... 2 more fields]
defined class MappedWay
maps: org.apache.spark.sql.Dataset[MappedWay] = [wayId: bigint, labels_located: array<map<bigint,struct<_1:boolean,_2:double,_3:double>>>]
combine: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,ArrayType(StructType(StructField(_1,LongType,false), StructField(_2,StructType(StructField(_1,BooleanType,false), StructField(_2,DoubleType,false), StructField(_3,DoubleType,false)),true)),true),Some(List(ArrayType(LongType,false), ArrayType(MapType(LongType,StructType(StructField(_1,BooleanType,false), StructField(_2,DoubleType,false), StructField(_3,DoubleType,false)),true),true))))
strSchema: String = array<struct<nodeId:long, nodeInfo:struct<label:boolean, latitude:double, longitude: double>>>
labeledWays: org.apache.spark.sql.DataFrame = [wayId: bigint, tags: array<string> ... 1 more field]
case class Intersection(OSMId: Long , latitude: Double, longitude: Double, inBuf: ArrayBuffer[(Long, Double, Double)], outBuf: ArrayBuffer[(Long, Double, Double)])

val segmentedWays = labeledWays.map(way => {
  
  val labeledNodes = way.getAs[Seq[Row]]("labeledNodes").map{case Row(k: Long, Row(v: Boolean, w:Double, x:Double)) => (k, v,w,x)}.toSeq //labeledNodes: (nodeid, label, lat, long)
  val wayId = way.getAs[Long]("wayId")
  
  val indexedNodes: Seq[((Long, Boolean, Double, Double), Int)] = labeledNodes.zipWithIndex //appends an integer as an index to every labeledNodes in a way
  
  val intersections = ArrayBuffer[Intersection]()  
  
  val currentBuffer = ArrayBuffer[(Long, Double, Double)]()
  
  val way_length = labeledNodes.length //number of nodes in a way
  
  if (way_length == 1) {

    val intersect = new Intersection(labeledNodes(0)._1, labeledNodes(0)._3, labeledNodes(0)._4, ArrayBuffer((-1L, 0D, 0D)), ArrayBuffer((-1L, 0D, 0D))) //include lat and long info

    var result = Array((intersect.OSMId, intersect.latitude, intersect.longitude, intersect.inBuf.toArray, intersect.outBuf.toArray))
    (wayId, result) //return
  }
  else {
    indexedNodes.foreach{ case ((id, isIntersection, latitude, longitude), i) => // id is nodeId and isIntersection is the node's boolean label
      if (isIntersection) {
        val newEntry = new Intersection(id, latitude, longitude, currentBuffer.clone, ArrayBuffer[(Long, Double, Double)]())
        intersections += newEntry
        currentBuffer.clear
      }
      else {
        currentBuffer ++= Array((id, latitude, longitude))  //if the node is not an intersection append the nodeId to the current buffer 
      }
      
      // Reaches the end of the way while the outBuffer is not empty
      // Append the currentBuffer to the last existing intersection
      if (i == way_length - 1 && !currentBuffer.isEmpty) {  
        if (intersections.isEmpty){
        intersections += new Intersection(-1, 0D, 0D, currentBuffer, ArrayBuffer[(Long, Double, Double)]()) 
        }
        else {
          intersections.last.outBuf ++= currentBuffer
        }
        currentBuffer.clear
      }
    }
    var result = intersections.map(i => (i.OSMId, i.latitude, i.longitude, i.inBuf.toArray, i.outBuf.toArray)).toArray  
    (wayId, result) 
  }
})
defined class Intersection
segmentedWays: org.apache.spark.sql.Dataset[(Long, Array[(Long, Double, Double, Array[(Long, Double, Double)], Array[(Long, Double, Double)])])] = [_1: bigint, _2: array<struct<_1:bigint,_2:double,_3:double,_4:array<struct<_1:bigint,_2:double,_3:double>>,_5:array<struct<_1:bigint,_2:double,_3:double>>>>]
val schema = "array<struct<nodeId:bigint,latitude:double,longitude:double,inBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>,outBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>>>"
segmentedWays.select($"_1".alias("wayId"), $"_2".cast(schema).alias("nodeInfo")).printSchema()
root
 |-- wayId: long (nullable = false)
 |-- nodeInfo: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- nodeId: long (nullable = true)
 |    |    |-- latitude: double (nullable = true)
 |    |    |-- longitude: double (nullable = true)
 |    |    |-- inBuff: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- nodeId: long (nullable = true)
 |    |    |    |    |-- latitude: double (nullable = true)
 |    |    |    |    |-- longitude: double (nullable = true)
 |    |    |-- outBuff: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- nodeId: long (nullable = true)
 |    |    |    |    |-- latitude: double (nullable = true)
 |    |    |    |    |-- longitude: double (nullable = true)

schema: String = array<struct<nodeId:bigint,latitude:double,longitude:double,inBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>,outBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>>>
//Unwrap the nested structure of the segmentedWays

val waySegmentDS = segmentedWays.flatMap(way => way._2.map(node => (way._1, node))) 
waySegmentDS: org.apache.spark.sql.Dataset[(Long, (Long, Double, Double, Array[(Long, Double, Double)], Array[(Long, Double, Double)]))] = [_1: bigint, _2: struct<_1: bigint, _2: double ... 3 more fields>]
import scala.collection.immutable.Map

val intersectionVertices = waySegmentDS
  .map(way => 
   //nodeId     latitude   longitude      wayId      inBuff      outBuff
   (way._2._1, (way._2._2, way._2._3, Map(way._1 -> (way._2._4, way._2._5))))) 
  .rdd
  //                     latitude, long, Map(wayId, inBuff, outBuff)
  .reduceByKey((a,b) => (a._1,     a._2, a._3 ++ b._3)) 

//intersectionVertices =  RDD[(nodeId, (latitude, longitude, wayMap(wayId -> inBuff, outBuff)))]
import scala.collection.immutable.Map
intersectionVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = ShuffledRDD[122] at reduceByKey at command-1211269020742696:9
intersectionVertices.count()
res17: Long = 191991
val edges = segmentedWays
  .filter(way => way._2.length > 1) //ways with more than one nodes
  .flatMap{ case (wayId, nodes_info) => {  
             nodes_info.sliding(2) 
               .flatMap(segment => //segment is the pair of two nodes
                   List(Edge(segment(0)._1, segment(1)._1, wayId))
               )
   }}
edges: org.apache.spark.sql.Dataset[org.apache.spark.graphx.Edge[Long]] = [srcId: bigint, dstId: bigint ... 1 more field]
edges.count()
res19: Long = 237069
sc.setCheckpointDir("/_checkpoint") // just a directory in distributed file system
val edges_rdd = edges.rdd
intersectionVertices.checkpoint()
edges_rdd.checkpoint()
edges_rdd: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[Long]] = MapPartitionsRDD[214] at rdd at command-1211269020742708:2
val roadGraph = Graph(intersectionVertices, edges_rdd).cache
roadGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),Long] = org.apache.spark.graphx.impl.GraphImpl@69447e5c
import com.esri.core.geometry.GeometryEngine.geodesicDistanceOnWGS84
import com.esri.core.geometry.Point
import com.esri.core.geometry.GeometryEngine.geodesicDistanceOnWGS84
import com.esri.core.geometry.Point
val weightedRoadGraph = roadGraph.mapTriplets{triplet => 
  def dist(lat1: Double, long1: Double, lat2: Double, long2: Double): Double = {
    val p1 = new Point(long1, lat1)
    val p2 = new Point(long2, lat2)
    geodesicDistanceOnWGS84(p1, p2)
  }
  
  val wayNodesInBuff = triplet.dstAttr._3(triplet.attr)._1 //dstAttr is the vertex attribute (latitude, longitude, wayMap(wayId -> inBuff, outBuff))
  
  if (wayNodesInBuff.isEmpty) {
      (triplet.attr, dist(triplet.srcAttr._1, triplet.srcAttr._2, triplet.dstAttr._1, triplet.dstAttr._2))
  
  } else {
      var distance: Double = 0.0

      distance += dist(triplet.srcAttr._1, triplet.srcAttr._2, wayNodesInBuff(0)._2, wayNodesInBuff(0)._3 )
    
      if (wayNodesInBuff.length > 1) {
      //accumulate the intermediate distances 
        distance += wayNodesInBuff.sliding(2).map{
        buff => dist(buff(0)._2, buff(0)._3, buff(1)._2, buff(1)._3)}
        .reduce(_ + _)
     }
    
      distance += dist(wayNodesInBuff.last._2, wayNodesInBuff.last._3, triplet.dstAttr._1, triplet.dstAttr._2)

      (triplet.attr, distance)
    }
  
}.cache
weightedRoadGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),(Long, Double)] = org.apache.spark.graphx.impl.GraphImpl@1645750f
weightedRoadGraph.edges.count() //number of edges 
res21: Long = 237069
weightedRoadGraph.edges.filter(edge => (edge.attr._2 > 100.0)).count() //number of suffering edges with a distance tolerance of 100 meters 
res22: Long = 137207
weightedRoadGraph.vertices.count() //number of vertices 
res23: Long = 191991

Step 4 - Construction of Coarsened Road Graph

  • The distance tolerance here is set to 100 meters.
import org.apache.spark.graphx.{Edge => Edges}
val splittedEdges = weightedRoadGraph.triplets.flatMap{triplet => {
  def dist(lat1: Double, long1: Double, lat2: Double, long2: Double): Double = {
    val p1 = new Point(long1, lat1)
    val p2 = new Point(long2, lat2)
    geodesicDistanceOnWGS84(p1, p2)
  }
  val maxDist = 100
  var finalResult = Array[(Edges[(Long,  Double)], (Long, (Double, Double, Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])), (Long, (Double, Double, Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])))]()
  
  if(triplet.attr._2 > maxDist){                            
    val wayId = triplet.attr._1
    var wayNodesBuff = triplet.dstAttr._3(wayId)._1 
    var wayNodesBuffSize = wayNodesBuff.length
    
    if(wayNodesBuffSize > 0){
      var previousSrc = triplet.srcId

      var distance: Double = 0.0
      var currentBuff = Array[(Long, Double, Double)]()
      
      distance += dist(triplet.srcAttr._1, triplet.srcAttr._2, wayNodesBuff(0)._2, wayNodesBuff(0)._3) 
      
      var newVertex = (triplet.srcId, triplet.srcAttr)
      var previousVertex = newVertex
      
      if (distance > maxDist){
        newVertex = (wayNodesBuff(0)._1, (wayNodesBuff(0)._2, wayNodesBuff(0)._3, Map(wayId -> (Array[(Long, Double, Double)](), Array[(Long, Double, Double)]()))))
            
        finalResult +:= (Edges(previousSrc, wayNodesBuff(0)._1, (wayId, distance)), previousVertex, newVertex) 
        
        previousVertex = newVertex
        
        distance = 0
        previousSrc = wayNodesBuff(0)._1
      }
      else 
      {
        currentBuff +:= wayNodesBuff(0)
      }
         
      //loop through pairs of nodes in the way (in the buffer)
      if (wayNodesBuff.length > 1){
      wayNodesBuff.sliding(2).foreach{segment => {
        
        val tmp_dst = distance
        distance += dist(segment(0)._2, segment(0)._3, segment(1)._2, segment(1)._3)
        
        if (distance > maxDist)
        {
          if(segment(0)._1 != previousSrc){
              //      Vertex(nodeId,      (lat,                long,     Map(wayId->inBuff, outBuff)))
            newVertex = (segment(0)._1, (segment(0)._2, segment(0)._3, Map(wayId -> (currentBuff, Array[(Long, Double, Double)]()))) )

            //adds the edge to the array
            finalResult +:= (Edges(previousSrc, segment(0)._1, (wayId, tmp_dst)), previousVertex, newVertex)

            previousVertex = newVertex
            distance -= tmp_dst
            previousSrc = segment(0)._1
            currentBuff = Array[(Long, Double, Double)]()
          }    
        }
        else 
        {
          currentBuff +:= segment(0)
        }
      }}}
      
      
      //from last node in the inBuff to the dst
      val tmp_dist = distance
      distance += dist(wayNodesBuff.last._2, wayNodesBuff.last._3, triplet.dstAttr._1, triplet.dstAttr._2)
      if (distance > maxDist){
        if (wayNodesBuff.last._1 != previousSrc){
            newVertex = (wayNodesBuff.last._1, (wayNodesBuff.last._2, wayNodesBuff.last._3, Map(wayId -> (currentBuff, Array[(Long, Double, Double)]()))))
            finalResult +:= (Edges(previousSrc, wayNodesBuff.last._1, (wayId, tmp_dist)), previousVertex, newVertex) 
            previousVertex = newVertex
            distance -= tmp_dist
            previousSrc = wayNodesBuff.last._1 
            currentBuff = Array[(Long, Double, Double)]()
            newVertex = (triplet.dstId, (triplet.dstAttr._1, triplet.dstAttr._2, Map(wayId -> (currentBuff, triplet.dstAttr._3(wayId)._2))) )
        }
      }
      finalResult +:= (Edges(previousSrc, triplet.dstId, (wayId, distance)), previousVertex, newVertex)
      
    }
    // Distance > threshold but no nodes in the way (buffer)
    else
    {
      finalResult +:= (Edges(triplet.srcId, triplet.dstId, triplet.attr), (triplet.srcId, triplet.srcAttr), (triplet.dstId, triplet.dstAttr))
    }
  }
  // Distance < threshold
  else
  {
    finalResult +:= (Edges(triplet.srcId, triplet.dstId, triplet.attr), (triplet.srcId, triplet.srcAttr), (triplet.dstId, triplet.dstAttr))
  }
  
  // return
  finalResult
}}
import org.apache.spark.graphx.{Edge=>Edges}
splittedEdges: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.Edge[(Long, Double)], (Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])), (Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])))] = MapPartitionsRDD[245] at flatMap at command-1211269020742721:2
splittedEdges.count() 
res28: Long = 734682
// Taking each edge and its reverse
val segmentedEdges = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => Array(edge)}
segmentedEdges.count() 
segmentedEdges: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[(Long, Double)]] = MapPartitionsRDD[246] at flatMap at command-1211269020742724:2
res29: Long = 734682
// Taking the individual vertices
val segmentedVertices = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => Array(srcVertex) ++ Array(dstVertex)}

segmentedVertices.map(node => node._1).distinct().count()
segmentedVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[240] at flatMap at command-1211269020742727:2
res27: Long = 685121
// Converting the vertices to a df
val verticesDF = segmentedVertices.toDF("nodeId","attr").select($"nodeId",$"attr._1".as("lat"),$"attr._2".as("long"),explode($"attr._3"))
    .withColumnRenamed("key","wayId").withColumnRenamed("value","buffers")
    .select($"nodeId",$"lat",$"long",$"wayId",$"buffers._1".as("inBuff"),$"buffers._2".as("outBuff"))
  
verticesDF.show(1,false)
+----------+---------+------------------+---------+------+-------+
|nodeId    |lat      |long              |wayId    |inBuff|outBuff|
+----------+---------+------------------+---------+------+-------+
|5109322585|54.647108|25.128094200000003|137882502|[]    |[]     |
+----------+---------+------------------+---------+------+-------+
only showing top 1 row

verticesDF: org.apache.spark.sql.DataFrame = [nodeId: bigint, lat: double ... 4 more fields]
//unique wayIds of the edges
val nodesWayId = splittedEdges.map{case(edge, srcVertex, dstVertex) => edge.attr._1}.toDF("nodesWayId").dropDuplicates() 
nodesWayId: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nodesWayId: bigint]
// Only vertices which have a wayId in their Map that is not included in any edge
// Dead end means there are no other intersection vertex in the way
val verticesWithDeadEndWays = verticesDF.join(nodesWayId, $"nodesWayId" === $"wayId", "leftanti") 
verticesWithDeadEndWays: org.apache.spark.sql.DataFrame = [nodeId: bigint, lat: double ... 4 more fields]
//convert df to rdd to be joined later with the rest of the vertices
import scala.collection.mutable.WrappedArray
val verticesWithDeadEndWaysRDD = verticesWithDeadEndWays.rdd.map(row => (row.getLong(0),(row.getDouble(1),row.getDouble(2),Map(row.getLong(3)-> (row.getAs[WrappedArray[(Long, Double, Double)]](4).array,row.getAs[WrappedArray[(Long, Double, Double)]](5).array)))))
import scala.collection.mutable.WrappedArray
verticesWithDeadEndWaysRDD: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[264] at map at command-1211269020742731:3
// for a node appearing in different ways, returns one vertex for each way
val verticesWithSharedWays = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => 
  {
    val srcVertex1 = (srcVertex._1,(srcVertex._2._1,srcVertex._2._2,Map(edge.attr._1 -> srcVertex._2._3(edge.attr._1))))
    val dstVertex1 = (dstVertex._1,(dstVertex._2._1,dstVertex._2._2,Map(edge.attr._1 -> dstVertex._2._3(edge.attr._1))))

    Array(srcVertex1) ++ Array(dstVertex1)
  }}.distinct()
verticesWithSharedWays: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[268] at distinct at command-1211269020742732:8
//union of verticesWithDeadEndWaysRDD and verticesWithSharedWays and reduced adding the maps 
val allVertices = verticesWithSharedWays.union(verticesWithDeadEndWaysRDD).reduceByKey((a,b) => (a._1, a._2, a._3 ++ b._3)) 
allVertices.count()
allVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = ShuffledRDD[270] at reduceByKey at command-1211269020742733:2
res34: Long = 685121
dbutils.fs.mkdirs("/_checkpoint1")
res36: Boolean = true
sc.setCheckpointDir("/_checkpoint1") // just a directory in distributed file system
allVertices.checkpoint()
segmentedEdges.checkpoint()
val coarsened_graph_100 = Graph(allVertices, segmentedEdges)
coarsened_graph_100: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),(Long, Double)] = org.apache.spark.graphx.impl.GraphImpl@128b8420

ScaDaMaLe Course site and book

PageRank algorithm in the graph

Stavroula Rafailia Vlachou (LinkedIn), Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano 
and Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

import crosby.binary.osmosis.OsmosisReader

import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path

import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink

import sqlContext.implicits._

import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._

import org.apache.spark.sql.functions._
import org.apache.spark.graphx._
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions._
import org.apache.spark.graphx._
spark.conf.set("spark.sql.parquet.binaryAsString", true)

val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")

case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])

val nodeDS = nodes_df.map(node => 
  NodeEntry(node.getAs[Long]("id"),
       node.getAs[Double]("latitude"),
       node.getAs[Double]("longitude"),
       node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
)).cache()
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
nodeDS.show(10)
+--------+------------------+------------------+-----------------+
|  nodeId|          latitude|         longitude|             tags|
+--------+------------------+------------------+-----------------+
|15389886|        54.7309125|25.239701200000003|[traffic_signals]|
|15389895|54.732171400000006|25.243689500000002|               []|
|15389899|        54.7352788|        25.2467356|               []|
|15389959|        54.7355529|        25.2458712|               []|
|15389961|54.735927100000005|25.245138800000003|               []|
|15389967|54.741563400000004|25.238850600000003|               []|
|15390015|54.735093600000006|        25.2478942|               []|
|15390016|54.734942700000005|        25.2500417|               []|
|15390017|54.734759200000006|25.251196200000003|               []|
|15390018|        54.7344154|        25.2522184|               []|
+--------+------------------+------------------+-----------------+
only showing top 10 rows
val edges_0 = spark.read.parquet("/_checkpoint/edges_LT_initial")
val vertices_0 = spark.read.parquet("/_checkpoint/vertices_LT_initial")
edges_0: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 1 more field]
vertices_0: org.apache.spark.sql.DataFrame = [id: bigint, Map: struct<_1: double, _2: double ... 1 more field>]
import org.apache.spark.graphx.Graph
import org.graphframes.GraphFrame
val r = GraphFrame(vertices_0, edges_0)
import org.apache.spark.graphx.Graph
import org.graphframes.GraphFrame
r: org.graphframes.GraphFrame = GraphFrame(v:[id: bigint, Map: struct<_1: double, _2: double ... 1 more field>], e:[src: bigint, dst: bigint ... 1 more field])
import org.apache.spark.graphx.lib.PageRank 
val segmentedGraph = r.toGraphX
// Run PageRank for a fixed number of iterations.

val new_ranks = PageRank.runUntilConvergence(segmentedGraph,tol=0.01,resetProb=0.15).cache()
import org.apache.spark.graphx.lib.PageRank
segmentedGraph: org.apache.spark.graphx.Graph[org.apache.spark.sql.Row,org.apache.spark.sql.Row] = org.apache.spark.graphx.impl.GraphImpl@6c4ec6ca
new_ranks: org.apache.spark.graphx.Graph[Double,Double] = org.apache.spark.graphx.impl.GraphImpl@74c3bb3
segmentedGraph.degrees.sortBy(-_._2).take(10)
res3: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((509277216,10), (429416369,10), (495049689,10), (450263482,10), (417013574,10), (2043449705,9), (1495498282,9), (429511082,9), (2967263581,9), (91091176,9))
val top_ranks = new_ranks.vertices
top_ranks: org.apache.spark.graphx.VertexRDD[Double] = VertexRDDImpl[1581] at RDD at VertexRDD.scala:57
top_ranks.take(1)
res4: Array[(org.apache.spark.graphx.VertexId, Double)] = Array((1935599424,1.28014875095845))
val ranksDS = top_ranks.toDF("id", "PageRank")
ranksDS: org.apache.spark.sql.DataFrame = [id: bigint, PageRank: double]
import org.apache.spark.sql.functions._
val ranks_located = ranksDS.join(nodeDS, ranksDS("id") === nodeDS("nodeId"), "left_outer").orderBy(col("PageRank").desc)
import org.apache.spark.sql.functions._
ranks_located: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: bigint, PageRank: double ... 4 more fields]
ranks_located.show(10)
+----------+------------------+----------+------------------+------------------+----+
|        id|          PageRank|    nodeId|          latitude|         longitude|tags|
+----------+------------------+----------+------------------+------------------+----+
|2370300576| 8.108818241224267|2370300576|54.873593400000004|        24.0557738|  []|
|9053620686| 7.344862855346021|9053620686|        55.2685698|22.526177800000003|  []|
|9455107664| 7.259922232547948|9455107664|54.889957800000005|        23.8424611|  []|
|1804105454| 6.454593752088028|1804105454| 55.26873500000001|        22.5265232|  []|
|3722657621|6.4231667975781095|3722657621|        56.3114233|        22.2750071|  []|
| 460043992| 6.218147627390615| 460043992|          55.92036|23.292098000000003|  []|
| 834596837| 6.012281739910181| 834596837|        54.6699622|        25.3708519|  []|
| 293618407| 5.907439750914314| 293618407|55.718506500000004|21.479742700000003|  []|
|3722657743| 5.877135433049267|3722657743|        56.3125726|        22.2706684|  []|
| 552930949| 5.856110729504616| 552930949|54.862435700000006|        24.4702166|  []|
+----------+------------------+----------+------------------+------------------+----+
only showing top 10 rows
ranks_located.where(col("id") === "509277216").show()
+---------+------------------+---------+----------+------------------+-----------------+
|       id|          PageRank|   nodeId|  latitude|         longitude|             tags|
+---------+------------------+---------+----------+------------------+-----------------+
|509277216|2.6820163978577063|509277216|55.9684321|25.585430900000002|[traffic_signals]|
+---------+------------------+---------+----------+------------------+-----------------+
val degrees = segmentedGraph.degrees.sortBy(-_._2).toDF("id","degree")
degrees: org.apache.spark.sql.DataFrame = [id: bigint, degree: int]
ranks_located.join(degrees, ranks_located("id") === degrees("id")).show(10)
segmentedGraph.vertices.count
res20: Long = 191991
segmentedGraph.edges.count
res14: Long = 237069

Map-matching OpenStreetMap Nodes to OpenStreetMap Ways

Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou
Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

What is map-matching?

Map matching is the problem of how to match recorded geographic coordinates to a logical model of the real world, typically using some form of Geographic Information System.

See https://en.wikipedia.org/wiki/Map_matching.

Map-Matching with GeoMatch

GeoMatch is a novel, scalable, and efficient big-data pipeline for large-scale map-matching on Apache Spark. It improves existing spatial big data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves.

The library can be found in the following git repository GeoMatch.

The necessary files to generate the jar for this work can be found in the following fork https://github.com/StavroulaVlachou/GeoMatch.

Read GeoMatch: Efficient Large-Scale Map Matching on Apache Spark

Instructions

git clone git@github.com:StavroulaVlachou/GeoMatch.git

cd Common

mvn compile install

cd ../GeoMatch

mvn compile install

The generated jar files can be found within the target directories. Then, 1. In Databricks choose Create -> Library and upload the packaged jars. 2. Create a Spark 2.4.0 - Scala 2.11 cluster with the uploaded GeoMatch library installed or if you are alreadt running a cluster and installed the uploaded library to it you have to detach and re-attache any notebook currently using that cluster.

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.serializer.KryoSerializer
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.serializer.KryoSerializer
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import crosby.binary.osmosis.OsmosisReader

import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path

import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink

import sqlContext.implicits._

import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.graphx._
import magellan.Point

import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.graphx._
import magellan.Point
ls /datasets/osm/uppsala
path name size
dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.node.parquet.crc .uppsalaTinyR.pbf.node.parquet.crc 172.0
dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.relation.parquet.crc .uppsalaTinyR.pbf.relation.parquet.crc 84.0
dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.way.parquet.crc .uppsalaTinyR.pbf.way.parquet.crc 84.0
dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf uppsalaTinyR.pbf 17867.0
dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.node.parquet uppsalaTinyR.pbf.node.parquet 20829.0
dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.relation.parquet uppsalaTinyR.pbf.relation.parquet 9394.0
dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.way.parquet uppsalaTinyR.pbf.way.parquet 9542.0
dbfs:/datasets/osm/uppsala/uppsalaTinyV.osm.pbf uppsalaTinyV.osm.pbf 30606.0
  • Run the following command only once per cluster
java -jar /dbfs/FileStore/jars/2706d711_3963_4d88_92e7_a8870d0164d1-osm_parquetizer_1_0_1_SNAPSHOT-80d25.jar /dbfs/datasets/osm/uppsala/uppsalaTinyR.pbf
2022-04-08 09:42:42 INFO  CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-08 09:42:47 INFO  CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-08 09:42:47 INFO  CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-08 09:42:53 INFO  App$MultiEntitySinkObserver:125 - Total entities processed: 896
ls /dbfs/datasets/osm/uppsala/
uppsalaTinyR.pbf
uppsalaTinyR.pbf.node.parquet
uppsalaTinyR.pbf.relation.parquet
uppsalaTinyR.pbf.way.parquet
uppsalaTinyV.osm.pbf
spark.conf.set("spark.sql.parquet.binaryAsString", true)

val nodes_df = spark.read.parquet("dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.node.parquet")
val ways_df = spark.read.parquet("dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.way.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
ways_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 6 more fields]
val allowableWays = Seq(
  "motorway",
  "motorway_link",
  "trunk",
  "trunk_link",
  "primary",
  "primary_link",
  "secondary",
  "secondary_link",
  "tertiary",
  "tertiary_link",
  "living_street",
  "residential",
  "road",
  "construction",
  "motorway_junction"
)
allowableWays: Seq[String] = List(motorway, motorway_link, trunk, trunk_link, primary, primary_link, secondary, secondary_link, tertiary, tertiary_link, living_street, residential, road, construction, motorway_junction)
//convert the nodes to Dataset containing the fields of interest

case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])

val nodeDS = nodes_df.map(node => 
  NodeEntry(node.getAs[Long]("id"),
       node.getAs[Double]("latitude"),
       node.getAs[Double]("longitude"),
       node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
)).cache()
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
//convert the ways to Dataset containing the fields of interest

case class WayEntry(wayId: Long, tags: Array[String], nodes: Array[Long])

val wayDS = ways_df.flatMap(way => {
        val tagSet = way.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) =>  value}.toArray
        if (tagSet.intersect(allowableWays).nonEmpty ){
            Array(WayEntry(way.getAs[Long]("id"),
            tagSet,
            way.getAs[Seq[Row]]("nodes").map{case Row(index:Integer, nodeId:Long) =>  nodeId}.toArray
            ))
        }
        else { Array[WayEntry]()}
}
).cache()
defined class WayEntry
wayDS: org.apache.spark.sql.Dataset[WayEntry] = [wayId: bigint, tags: array<string> ... 1 more field]
val distinctNodesWays = wayDS.flatMap(_.nodes).distinct //the distinct nodes within the ways 
distinctNodesWays: org.apache.spark.sql.Dataset[Long] = [value: bigint]
val wayNodes = nodeDS.as("nodes") //nodes that are in a way + nodes info from nodeDS
  .joinWith(distinctNodesWays.as("ways"), $"ways.value" === $"nodes.nodeId")
  .map(_._1).cache
wayNodes: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.functions.concat_ws
import org.apache.spark.sql.functions._

val nodes = wayDS.
  select($"wayId", $"nodes").
  withColumn("node", explode($"nodes")).
  drop("nodes")
val wayNodesLocated = nodes.join(wayNodes, wayNodes.col("nodeId") === nodes.col("node")).select($"wayId", $"node", $"latitude", $"longitude").groupBy("wayId").agg(collect_list(concat($"latitude",lit(" "), $"longitude")).alias("list_of_coordinates")).withColumn("coordinates_str", concat_ws("," ,col("list_of_coordinates"))).drop("list_of_coordinates")
wayNodesLocated.show(1, false)
+---------+----------------------------------------------------------+
|wayId    |coordinates_str                                           |
+---------+----------------------------------------------------------+
|393182257|59.8569759 17.644382,59.857381800000006 17.645299100000003|
+---------+----------------------------------------------------------+
only showing top 1 row

import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.functions.concat_ws
import org.apache.spark.sql.functions._
nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
wayNodesLocated: org.apache.spark.sql.DataFrame = [wayId: bigint, coordinates_str: string]
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
Initializing...
Java version : 1.8.0_282 (Azul Systems, Inc.) amd64
def project_to_meters(lon: String, lat: String): String = { 
    
    if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
  
    val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326)) //WGS84
    val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))  //European Grid
    reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res9: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
val ways_reprojected = wayNodesLocated.rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(' ');val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});("LineString"+" " +parts(0).toString, arrCoords)})
val waysDF = ways_reprojected.toDF("LineStringId","coords")
val ways_unpacked = waysDF.select(col("LineStringId"),concat_ws(",",col("coords"))).rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]",""))
ways_unpacked.take(1)
ways_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[55] at map at command-4438247265478911:1
waysDF: org.apache.spark.sql.DataFrame = [LineStringId: string, coords: array<string>]
ways_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[61] at map at command-4438247265478911:3
res10: Array[String] = Array(LineString 393182257,4749494.332253 4107152.617124,4749540.389628 4107203.021679)
val rddFirst = ways_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMLineString(parts(0), arrCoords)}))
rddFirst: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[62] at mapPartitions at command-4438247265478912:1
rddFirst.take(1)
res12: Array[org.cusp.bdi.gm.geom.GMLineString] = Array(GMLineString(LineString 393182257,[Lscala.Tuple2;@16eea034))
val rddFirstSet = sc.textFile("FileStore/tables/UUways.csv").mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMLineString(parts(0), arrCoords)}))
rddFirstSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[65] at mapPartitions at command-374221935645076:1
rddFirstSet.take(1)
res13: Array[org.cusp.bdi.gm.geom.GMLineString] = Array(GMLineString(LineString 393182257,[Lscala.Tuple2;@2aa93df3))
rddFirstSet.count() //9 ways 
res14: Long = 9
val rddSecondSet = sc.textFile("FileStore/tables/UUnodes.csv").mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").split(',');new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
rddSecondSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[68] at mapPartitions at command-432075383419156:1
rddSecondSet.take(1)
res15: Array[org.cusp.bdi.gm.geom.GMPoint] = Array(GMPoint(Point 312352,(4749694,4107105)))
rddSecondSet.count() //626 nodes to be map-matched 
res16: Long = 626
val geoMatch = new GeoMatch(false, 16, 150, (-1, -1, -1, -1)) //n(=dimension of the Hilber curve) should be a power of 2. 
geoMatch: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,16,150.0,(-1,-1,-1,-1))
val resultRDD = geoMatch.spatialJoinKNN(rddFirst, rddSecondSet, 1, false)
resultRDD: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMLineString])] = MapPartitionsRDD[83] at mapPartitions at GeoMatch.scala:94
resultRDD.filter(element => (element._2.isEmpty)).count()  //number of nodes that are not matched successfully
res19: Long = 44
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).toDF("pointId", "matchId").show(5, false)
+---------------+----------------------+
|pointId        |matchId               |
+---------------+----------------------+
|Point 312363   |[LineString 263934971]|
|Point 25724030 |[LineString 263934971]|
|Point 25735257 |[LineString 263934973]|
|Point 25812013 |[LineString 263934971]|
|Point 390925129|[LineString 263934971]|
+---------------+----------------------+
only showing top 5 rows

ScaDaMaLe Course site and book

Map-matching OpenStreetMap Nodes to Road Graph elements

Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou
Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

import org.apache.spark.graphx._
import sqlContext.implicits._
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions.{concat, lit}


import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import org.apache.spark.graphx._
import sqlContext.implicits._
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions.{concat, lit}
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
val edges = spark.read.parquet("dbfs:/graphs/uppsala/edges")
val vertices = spark.read.parquet("dbfs:/graphs/uppsala/vertices").toDF("vertexId", "latitude", "longitude")
edges: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint]
vertices: org.apache.spark.sql.DataFrame = [vertexId: bigint, latitude: double ... 1 more field]
val src_coordinates = edges.join(vertices,vertices("vertexId") === edges("src"), "left_outer").drop("vertexId").withColumnRenamed("latitude", "src_latitude").withColumnRenamed("longitude","src_longitude")
val edge_coordinates = src_coordinates.join(vertices,vertices("vertexId") === src_coordinates("dst")).drop("vertexId").withColumnRenamed("latitude", "dst_latitude").withColumnRenamed("longitude", "dst_longitude")
src_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 2 more fields]
edge_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 4 more fields]
val concat_coordinates = edge_coordinates.select($"src",concat($"src_latitude",lit(" "),$"src_longitude").alias("src_coordinates"), $"dst",concat($"dst_latitude",lit(" "),$"dst_longitude").alias("dst_coordinates"))
concat_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, src_coordinates: string ... 2 more fields]
val linestring_coordinates = concat_coordinates.select($"src", $"dst",concat($"src_coordinates", lit(","), $"dst_coordinates").alias("list_of_coordinates"))
linestring_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 1 more field]
val first = linestring_coordinates.select(concat(lit("LineString:"),$"src",lit("+"), $"dst").alias("LineString"),$"list_of_coordinates")
first: org.apache.spark.sql.DataFrame = [LineString: string, list_of_coordinates: string]
val first_rdd = first.rdd
first_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[124] at rdd at command-4069571511113730:1
if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
def project_to_meters(lon: String, lat: String): String = { 
    
    if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
  
    val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326))
    val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
    reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res9: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
val ways_reprojected = first_rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(' ');val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});(parts(0).toString, arrCoords)})
ways_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[126] at map at command-4069571511113735:1
val ways_unpacked = ways_reprojected.map(item => item._1.toString + "," + item._2(0).toString + "," + item._2(1).toString)
ways_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[127] at map at command-4069571511113736:1
val rdd_first_set = ways_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMLineString(parts(0), arrCoords)}))
rdd_first_set: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[128] at mapPartitions at command-4069571511113737:1
rdd_first_set.take(1)
res10: Array[org.cusp.bdi.gm.geom.GMLineString] = Array(GMLineString(LineString:312363+25735257,[Lscala.Tuple2;@7336e6f4))
def unpack_lat(str: String): String = {
        val lat = str.replaceAll(",","").replaceAll("\\[","").split(" ")(2)
        return lat
}
spark.udf.register("unpack_lat", unpack_lat(_:String): String)

def unpack_lon(str: String): String = {
        val lon = str.replaceAll(",","").replaceAll("\\[","").split(" ")(1)
        return lon
}
spark.udf.register("unpack_lon", unpack_lon(_:String): String)
unpack_lat: (str: String)String
unpack_lon: (str: String)String
res11: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val initial_points = vertices.toDF().select(col("vertexId").cast(StringType), col("latitude").cast(StringType), col("longitude").cast(StringType)).withColumn("Point", lit("Point "))
val reprojected_points = initial_points.selectExpr("concat(Point,vertexId) as PointId","project_to_meters(longitude, latitude) as reprojection")
val unpacked_reprojection = reprojected_points.selectExpr("PointId","unpack_lat(reprojection) as new_lat", "unpack_lon(reprojection) as new_lon").rdd
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
initial_points: org.apache.spark.sql.DataFrame = [vertexId: string, latitude: string ... 2 more fields]
reprojected_points: org.apache.spark.sql.DataFrame = [PointId: string, reprojection: string]
unpacked_reprojection: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[133] at rdd at command-2803386459776172:5
unpacked_reprojection.take(1)
res13: Array[org.apache.spark.sql.Row] = Array([Point 25812013,4107235.859946,4749331.992325])
val f = unpacked_reprojection.map(line => {val id = line(0).toString; val lat = line(1).toString; val lon = line(2).toString;id+"," + lat +","+ lon})
f: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[134] at map at command-2803386459776174:1
val rddSecondSet = f.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").split(',');new GMPoint(parts(0), (parts(2).toDouble.toInt, parts(1).toDouble.toInt))}))
rddSecondSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[135] at mapPartitions at command-2803386459776169:1
rddSecondSet.take(1)
res14: Array[org.cusp.bdi.gm.geom.GMPoint] = Array(GMPoint(Point 25812013,(4749331,4107235)))
val geoMatch = new GeoMatch(false, 16, 150, (-1, -1, -1, -1)) //n(=dimension of the Hilber curve) should be a power of 2. 
geoMatch: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,16,150.0,(-1,-1,-1,-1))
val resultRDD = geoMatch.spatialJoinKNN(rdd_first_set, rddSecondSet, 1, false)
resultRDD: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMLineString])] = MapPartitionsRDD[150] at mapPartitions at GeoMatch.scala:94
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).take(10)
res15: Array[(String, scala.collection.mutable.ListBuffer[String])] = Array((Point 25735257,ListBuffer(LineString:25735257+3067700641)), (Point 312363,ListBuffer(LineString:3067700668+312363)), (Point 3067700641,ListBuffer(LineString:25735257+3067700641)), (Point 3963994985,ListBuffer(LineString:3963994985+25735257)), (Point 312353,ListBuffer(LineString:312353+25734373)), (Point 3067700668,ListBuffer(LineString:3067700668+312363)), (Point 2206536278,ListBuffer(LineString:3067700641+2206536278)), (Point 25734373,ListBuffer(LineString:25734373+3431600977)))
resultRDD.toDF("k", "line").show(10, false)
+--------------------------------------+------------------------------------------------------------------------------+
|k                                     |line                                                                          |
+--------------------------------------+------------------------------------------------------------------------------+
|[Point 455006648, [4749516, 4107261]] |[]                                                                            |
|[Point 25735257, [4749494, 4107152]]  |[[LineString:25735257+3067700641, [[4749494, 4107152], [4749524, 4107127]]]]  |
|[Point 312363, [4749423, 4107214]]    |[[LineString:3067700668+312363, [[4749419, 4107218], [4749423, 4107214]]]]    |
|[Point 3067700641, [4749524, 4107127]]|[[LineString:25735257+3067700641, [[4749494, 4107152], [4749524, 4107127]]]]  |
|[Point 3431600977, [4749699, 4107100]]|[]                                                                            |
|[Point 3963994985, [4749540, 4107203]]|[[LineString:3963994985+25735257, [[4749540, 4107203], [4749494, 4107152]]]]  |
|[Point 312353, [4749573, 4107212]]    |[[LineString:312353+25734373, [[4749573, 4107212], [4749648, 4107146]]]]      |
|[Point 3067700668, [4749419, 4107218]]|[[LineString:3067700668+312363, [[4749419, 4107218], [4749423, 4107214]]]]    |
|[Point 25812013, [4749331, 4107235]]  |[]                                                                            |
|[Point 2206536278, [4749587, 4107073]]|[[LineString:3067700641+2206536278, [[4749524, 4107127], [4749587, 4107073]]]]|
+--------------------------------------+------------------------------------------------------------------------------+
only showing top 10 rows

Map-Matching Events on a State Space / Road Graph with GeoMatch

Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and
Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

Map-Matching with GeoMatch

GeoMatch is a novel, scalable, and efficient big-data pipeline for large-scale map-matching on Apache Spark. It improves existing spatial big data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves.

The library can be found in the following git repository GeoMatch.

The necessary files to generate the jar for this work can be found in the following fork https://github.com/StavroulaVlachou/GeoMatch.

Instructions

git clone git@github.com:StavroulaVlachou/GeoMatch.git

cd Common

mvn compile install

cd ../GeoMatch

mvn compile install

The generated jar files can be found within the target directories. Then, 1. In Databricks choose Create -> Library and upload the packaged jars. 2. Create a Spark 2.4.0 - Scala 2.11 cluster with the uploaded GeoMatch library installed or if you are alreadt running a cluster and installed the uploaded library to it you have to detach and re-attache any notebook currently using that cluster.

//This allows easy embedding of publicly available information into any other notebook
//when viewing in git-book just ignore this block - you may have to manually chase the URL in frameIt("URL").
//Example usage:
// displayHTML(frameIt("https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation#Topics_in_LDA",250))
def frameIt( u:String, h:Int ) : String = {
      """<iframe 
 src=""""+ u+""""
 width="95%" height="""" + h + """"
 sandbox>
  <p>
    <a href="http://spark.apache.org/docs/latest/index.html">
      Fallback link for browsers that, unlikely, don't support frames
    </a>
  </p>
</iframe>"""
   }
displayHTML(frameIt("https://en.wikipedia.org/wiki/Map_matching",600))
import org.apache.spark.graphx._
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import scala.collection.JavaConversions._
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import org.apache.spark.graphx._
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import scala.collection.JavaConversions._
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._

State Space / Road Graph

  • In this work, we wish to match points of interest - events - against states of a State Space. The State Space consists of elements of the Road Graph. Specifically, a state is either a vertex that corresponds to an intersection point or an edge which is essentially a road segment.
  • First we obtain the nodes and ways of the underlying road network.
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
val ways_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.way.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
ways_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 6 more fields]
//convert the nodes to Dataset containing the fields of interest
case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])

val nodeDS = nodes_df.map(node => 
  NodeEntry(node.getAs[Long]("id"),
       node.getAs[Double]("latitude"),
       node.getAs[Double]("longitude"),
       node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
))
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
  • The next step is to obtain the intersection points and associate them with their corresponding vertices on the graph.
val intersections = spark.read.parquet("dbfs:/LT/intersections")
intersections: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
intersections.count //in this area there are 162325 intersection points 
res3: Long = 162325
  • GeoMatch deals with points whose coordinates are measured in meters. However, OSM data have their coordinates expressed in degrees (WGS84 - spatial reference index 4326). Thus, for each point that is to participate in the matching we identify it's OSM coordinates and reproject them onto the European Grid (spatial reference index 3035).
val intersection_points = nodeDS.join(intersections, intersections("intersectionNode") === nodeDS("nodeId")).drop("tags", "nodeId").select("intersectionNode", "latitude", "longitude")
intersection_points: org.apache.spark.sql.DataFrame = [intersectionNode: bigint, latitude: double ... 1 more field]
val concat_coordinates = intersection_points.select($"intersectionNode",concat($"latitude",lit(" "),$"longitude").alias("coordinates"))
concat_coordinates: org.apache.spark.sql.DataFrame = [intersectionNode: bigint, coordinates: string]
val firstIntersectionStates = concat_coordinates.select(concat(lit("LineString:"),$"intersectionNode").alias("LineString"),$"coordinates")
val firstIntersectionStates_rdd = firstIntersectionStates.rdd
firstIntersectionStates: org.apache.spark.sql.DataFrame = [LineString: string, coordinates: string]
firstIntersectionStates_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[318] at rdd at command-3336180278405410:2
if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
def project_to_meters(lon: String, lat: String): String = { 
    
    if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
  
    val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326))
    val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
    reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res8: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
val intersections_reprojected = firstIntersectionStates_rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]",""))
              .map(line => {val parts = line.replaceAll("\"","").split(",");
                            val arrCoords = parts.slice(1,parts.length)
              .map(xyStr => {val xy = xyStr.split(" ");
                             val reprojection = project_to_meters(xy(1).toString, xy(0).toString);
                             val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);
                             val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});
                            (parts(0).toString, arrCoords)})
intersections_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[320] at map at command-3336180278405413:2
val intersections_unpacked = intersections_reprojected.map(item => item._1.toString + "," + item._2(0).toString)
intersections_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[321] at map at command-3336180278405414:1
val rdd_first_set_intersections = intersections_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMPoint(parts(0), arrCoords(0))}))
rdd_first_set_intersections: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[322] at mapPartitions at command-3336180278405415:1
  • The next step is to fetch the events that are to be map-matched and transform their coordinates as well. Note that for this work, the events of interest are accidents recorded within Lithuania's road network.
val events = spark.read.format("csv").load("/FileStore/tables/LTnodes.csv").rdd.map(line => line.toString)
events.count() //there are 11989 events to be matched 
events: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[336] at map at command-3336180278405417:1
res9: Long = 11989
val all_accidents = spark.read.format("csv").load("/FileStore/tables/LTnodes.csv").toDF("PointId", "longitude", "latitude")
all_accidents: org.apache.spark.sql.DataFrame = [PointId: string, longitude: string ... 1 more field]
val rddSecondSet = events.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").split(',');new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
rddSecondSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[345] at mapPartitions at command-3336180278405419:1

1st round of Map Matching

  • In this first round the focus is around the intersection points and the events occurring within a predefined distance from them. Here the distance tolerance is set to 20 meters and the number of neighbours to be found is 1.
val geoMatch = new GeoMatch(false, 256, 20, (-1, -1, -1, -1)) //dimension of the Hilber curve=256, default value,  should be a power of 2. 
geoMatch: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,256,20.0,(-1,-1,-1,-1))
val resultRDD = geoMatch.spatialJoinKNN(rdd_first_set_intersections, rddSecondSet, 1, false)
resultRDD: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMPoint])] = MapPartitionsRDD[358] at mapPartitions at GeoMatch.scala:94
  • 3743 events (out of 11989) are found to be within a 20 meter distance radius from intersection points.
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).count()
res11: Long = 3743
val result_first_round = resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).map(element => (element._1, element._2(0))).toDF("PointId", "State")
result_first_round: org.apache.spark.sql.DataFrame = [PointId: string, State: string]
val intersection_counts = result_first_round.groupBy("State").count
intersection_counts: org.apache.spark.sql.DataFrame = [State: string, count: bigint]
  • One of the advantages of GeoMatch is that it carried all of the data points that are to be matched throughout the pipeline, even in the case where no match is found. This is key in this case, since the points were not matched successfully during this first round are subject to a second iteration where they are to be matched against the remaining of the State Space.
val unmatched_events = resultRDD.filter(element => (element._2.isEmpty)).map(element => element._1.payload).toDF("id")
val second_set_second_round = unmatched_events.join(all_accidents, unmatched_events("id") === all_accidents("PointId")).drop("id").rdd.map(line => line.toString)
val rddSecondSetSecondRound = second_set_second_round
.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").split(',');
                              new GMPoint(parts(0),(parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
unmatched_events: org.apache.spark.sql.DataFrame = [id: string]
second_set_second_round: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[373] at map at command-3336180278405428:2
rddSecondSetSecondRound: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[374] at mapPartitions at command-3336180278405428:4
  • The remaining of the State Space consists of the edges of the Road Graph. In the following cells, we fetch these edges and associate them with their OSM coordinates and their reporjection.
val edges = spark.read.parquet("dbfs:/_checkpoint/edges_LT_initial") //edges of G0
val vertices = spark.read.parquet("dbfs:/_checkpoint/vertices_LT_initial").toDF("vertexId", "latitude", "longitude") //vertices of G0
edges: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint]
vertices: org.apache.spark.sql.DataFrame = [vertexId: bigint, latitude: double ... 1 more field]
val src_coordinates = edges.join(vertices,vertices("vertexId") === edges("src"), "left_outer").drop("vertexId").withColumnRenamed("latitude", "src_latitude").withColumnRenamed("longitude","src_longitude")
val edge_coordinates = src_coordinates.join(vertices,vertices("vertexId") === src_coordinates("dst")).drop("vertexId").withColumnRenamed("latitude", "dst_latitude").withColumnRenamed("longitude", "dst_longitude")
src_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 2 more fields]
edge_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 4 more fields]
val concat_coordinates = edge_coordinates.select($"src",concat($"src_latitude",lit(" "),$"src_longitude").alias("src_coordinates"), $"dst",concat($"dst_latitude",lit(" "),$"dst_longitude").alias("dst_coordinates"))
concat_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, src_coordinates: string ... 2 more fields]
val linestring_coordinates = concat_coordinates.select($"src", $"dst",concat($"src_coordinates", lit(","), $"dst_coordinates").alias("list_of_coordinates"))
linestring_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 1 more field]
val first = linestring_coordinates.select(concat(lit("LineString:"),$"src",lit("+"), $"dst").alias("LineString"),$"list_of_coordinates")
first: org.apache.spark.sql.DataFrame = [LineString: string, list_of_coordinates: string]
val ways_reprojected = first.rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(' ');val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});(parts(0).toString, arrCoords)})
ways_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[389] at map at command-3336180278405435:1
val ways_unpacked = ways_reprojected.map(item => item._1.toString + "," + item._2(0).toString + "," + item._2(1).toString)
ways_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[390] at map at command-3336180278405436:1
val rdd_first_set = ways_unpacked
.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');
                             val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});
                             new GMLineString(parts(0), arrCoords)}))
rdd_first_set: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[391] at mapPartitions at command-3336180278405437:2
  • In this second round of Map-Matching, the distance threshold is set to be 200 meters. The dimension of the Hilbert index curve is again set to each desault value (256) and the number of nearest neighnours to be found is 1.
val geoMatchSecond = new GeoMatch(false, 256, 200, (-1, -1, -1, -1)) 
geoMatchSecond: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,256,200.0,(-1,-1,-1,-1))
val resultRDDsecond = geoMatchSecond.spatialJoinKNN(rdd_first_set, rddSecondSetSecondRound, 1, false)
resultRDDsecond: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMLineString])] = MapPartitionsRDD[404] at mapPartitions at GeoMatch.scala:94
  • The number of events that do not lie within a 200 meter radius from road segments is 269.
resultRDDsecond.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => (element._2.isEmpty)).count()
res20: Long = 269
  • We are interested in how many events are matched against each state.
val res = resultRDDsecond.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty))
res: org.apache.spark.rdd.RDD[(String, scala.collection.mutable.ListBuffer[String])] = MapPartitionsRDD[408] at filter at command-3336180278405445:1
val result_second_round = res.map(element => (element._1, element._2(0))).toDF("PointId", "State")
result_second_round: org.apache.spark.sql.DataFrame = [PointId: string, State: string]
val edge_counts = result_second_round.groupBy("State").count
edge_counts: org.apache.spark.sql.DataFrame = [State: string, count: bigint]
val state_counts = edge_counts.union(intersection_counts)
state_counts: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [State: string, count: bigint]
val all_intersection_states = rdd_first_set_intersections.toDF("stateId", "coords").drop("coords")
val all_edge_states = rdd_first_set.toDF("stateId", "coords").drop("coords")
val all_states = all_intersection_states.union(all_edge_states)
all_states.count //number of states 
all_intersection_states: org.apache.spark.sql.DataFrame = [stateId: string]
all_edge_states: org.apache.spark.sql.DataFrame = [stateId: string]
all_states: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [stateId: string]
res24: Long = 399394
  • Find the states with no event has been matched against, assign count value equal to 0 and union them with the rest of the states_counts. This way, each state in the State Space is assigned a numerical value representing the number of accidents that have occurred within that state.
val s1 = all_states.join(state_counts, all_states("stateId") === state_counts("State"), "left_outer").drop("State")
val s_final = s1.na.fill(0)
s1: org.apache.spark.sql.DataFrame = [stateId: string, count: bigint]
s_final: org.apache.spark.sql.DataFrame = [stateId: string, count: bigint]
s_final.distinct.agg(sum("count")).show()  //11720 events in total successfully matched 
+----------+
|sum(count)|
+----------+
|     11720|
+----------+
def trim_id(stateId: String): String = {
  val res = stateId.split(":")(1)
  return res
}

def trim_point(pointId: String): String = {
  val res = pointId.split(" ")(1)
  return res
}
spark.udf.register("trim_point", trim_point(_:String): String)
spark.udf.register("trim_id", trim_id(_:String): String)

trim_id: (stateId: String)String
trim_point: (pointId: String)String
res28: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
val total_result = result_first_round.union(result_second_round)
val trimed_total_result = total_result.selectExpr("trim_point(PointId) as point", "trim_id(State) as state")
total_result: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [PointId: string, State: string]
trimed_total_result: org.apache.spark.sql.DataFrame = [point: string, state: string]
  • Return here after notebook 034_06SimulatingArrivalTimesNHPP_Inversion
  • We want to map the simulated graph elements into an exact location
val df = spark.read.parquet("dbfs:/roadSafety/simulation_location").toDF("simulated_location", "arrival_time")
val location_id = df.select("simulated_location")
df: org.apache.spark.sql.DataFrame = [simulated_location: string, arrival_time: double]
location_id: org.apache.spark.sql.DataFrame = [simulated_location: string]
import org.apache.spark.sql.functions._
val intersection_samples = location_id.join(nodes_df, col("simulated_location") === col("id")).select("simulated_location", "latitude", "longitude")
intersection_samples.count
val edge_ids = edge_coordinates.withColumn("edge_id", concat(col("src"), lit("+"), col("dst")))
val edge_samples = location_id.join(edge_ids, col("simulated_location") === col("edge_id")).drop("src", "dst", "edge_id")
import org.apache.spark.sql.functions._
intersection_samples: org.apache.spark.sql.DataFrame = [simulated_location: string, latitude: double ... 1 more field]
edge_ids: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 5 more fields]
edge_samples: org.apache.spark.sql.DataFrame = [simulated_location: string, src_latitude: double ... 3 more fields]
import org.apache.spark.mllib.random.RandomRDDs
val random_edge_coordinates = edge_samples.withColumn("random_sample", rand())
import org.apache.spark.mllib.random.RandomRDDs
random_edge_coordinates: org.apache.spark.sql.DataFrame = [simulated_location: string, src_latitude: double ... 4 more fields]
  • For each simulated edge, generate a two dimensional uniform sample and scale it according to the coordinates of the edge's source and destination
def random_lat(src_lat: Double, dst_lat: Double, sample: Double): Double = {
  val lat_min = src_lat.min(dst_lat)
  val lat_max = src_lat.max(dst_lat)
  val lat = sample * (lat_max - lat_min) + lat_min
  return lat
}
def random_lon(src_lon: Double, dst_lon: Double, sample: Double): Double = {
  val lon_min = src_lon.min(dst_lon)
  val lon_max = src_lon.max(dst_lon)
  val lon = sample * (lon_max - lon_min) + lon_min
  return lon
}

spark.udf.register("random_lat", random_lat(_: Double, _: Double, _: Double): Double)
spark.udf.register("random_lon", random_lon(_: Double, _: Double, _: Double): Double)


val random_coordinates = random_edge_coordinates.selectExpr("random_lat(src_latitude, dst_latitude, random_sample) as latitude", "random_lon(src_longitude, dst_longitude, random_sample) as longitude")
random_lat: (src_lat: Double, dst_lat: Double, sample: Double)Double
random_lon: (src_lon: Double, dst_lon: Double, sample: Double)Double
random_coordinates: org.apache.spark.sql.DataFrame = [latitude: double, longitude: double]
val df_final = random_coordinates.union(intersection_samples.select("latitude", "longitude"))
df_final.count()
df_final: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [latitude: double, longitude: double]
res36: Long = 12089
df_final.show()

Output:

+------------------+------------------+
|          latitude|         longitude|
+------------------+------------------+
|54.66xxx          |25.29yyy          |
+------------------+------------------+

Map-Matching Events on a State Space / Coarsened Road Graph with GeoMatch

Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and
Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

import org.apache.spark.graphx._
import sqlContext.implicits._
import scala.collection.JavaConversions._
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import org.apache.spark.graphx._
import sqlContext.implicits._
import scala.collection.JavaConversions._
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
spark.conf.set("spark.sql.parquet.binaryAsString", true)

val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
val ways_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.way.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
ways_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 6 more fields]
//convert the nodes to Dataset containing the fields of interest

case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])

val nodeDS = nodes_df.map(node => 
  NodeEntry(node.getAs[Long]("id"),
       node.getAs[Double]("latitude"),
       node.getAs[Double]("longitude"),
       node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
)).cache()
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]

The first step is to obtain the state space. The State Space consists of road segments and intersection points. The road segments correspond to the edges of the graph while the intersection points can be retrieved from the ways and the nodes dataset as those nodes that lie in at least one way. All coordinates should be in the spatial reference system 3035. To implement the map matching it is better to keep all intermediate points from each edge.

display(dbutils.fs.ls("dbfs:/LT"))
path name size
dbfs:/LT/intersections/ intersections/ 0.0
val intersections = spark.read.parquet("dbfs:/LT/intersections")
intersections.show(1)
+----------------+
|intersectionNode|
+----------------+
|       270958413|
+----------------+
only showing top 1 row

intersections: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
intersections.count
res5: Long = 162325

The next step is to obtain the coordinates of the intersection points and convert them into decimal degrees.

val intersection_points = nodeDS.join(intersections, intersections("intersectionNode") === nodeDS("nodeId")).drop("tags", "nodeId").select("intersectionNode", "latitude", "longitude")
intersection_points.show(1)
+----------------+----------+------------------+
|intersectionNode|  latitude|         longitude|
+----------------+----------+------------------+
|        15389886|54.7309125|25.239701200000003|
+----------------+----------+------------------+
only showing top 1 row

intersection_points: org.apache.spark.sql.DataFrame = [intersectionNode: bigint, latitude: double ... 1 more field]
intersection_points.count()
res8: Long = 162325
import org.apache.spark.sql.functions.{concat, lit}
val concat_coordinates = intersection_points.select($"intersectionNode",concat($"latitude",lit(" "),$"longitude").alias("coordinates"))
concat_coordinates.show(1, false)
+----------------+-----------------------------+
|intersectionNode|coordinates                  |
+----------------+-----------------------------+
|15389886        |54.7309125 25.239701200000003|
+----------------+-----------------------------+
only showing top 1 row

import org.apache.spark.sql.functions.{concat, lit}
concat_coordinates: org.apache.spark.sql.DataFrame = [intersectionNode: bigint, coordinates: string]
val firstIntersectionStates = concat_coordinates.select(concat(lit("LineString:"),$"intersectionNode").alias("LineString"),$"coordinates")
firstIntersectionStates.show(1, false)
val firstIntersectionStates_rdd = firstIntersectionStates.rdd
firstIntersectionStates_rdd.take(1)
+-------------------+-----------------------------+
|LineString         |coordinates                  |
+-------------------+-----------------------------+
|LineString:15389886|54.7309125 25.239701200000003|
+-------------------+-----------------------------+
only showing top 1 row

firstIntersectionStates: org.apache.spark.sql.DataFrame = [LineString: string, coordinates: string]
firstIntersectionStates_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[545] at rdd at command-197980058855229:3
res11: Array[org.apache.spark.sql.Row] = Array([LineString:15389886,54.7309125 25.239701200000003])
if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
def project_to_meters(lon: String, lat: String): String = { 
    
    if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
  
    val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326))
    val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
    reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res14: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
val intersections_reprojected = firstIntersectionStates_rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(" ");val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});(parts(0).toString, arrCoords)})
intersections_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[547] at map at command-197980058855232:1
intersections_reprojected.take(1)
res15: Array[(String, Array[String])] = Array((LineString:15389886,Array(5294624.872733 3617234.130316)))
val intersections_unpacked = intersections_reprojected.map(item => item._1.toString + "," + item._2(0).toString)
intersections_unpacked.take(1)
intersections_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[548] at map at command-197980058855234:1
res16: Array[String] = Array(LineString:15389886,5294624.872733 3617234.130316)
val rdd_first_set_intersections = intersections_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMPoint(parts(0), arrCoords(0))}))
rdd_first_set_intersections: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[549] at mapPartitions at command-197980058855235:1
rdd_first_set_intersections.take(1)
res17: Array[org.cusp.bdi.gm.geom.GMPoint] = Array(GMPoint(LineString:15389886,(5294624,3617234)))

Next, we need to obtain the set of points that are to be map matched. In this case the set of points corresponds to the accident events occuring in LT.

val events = spark.read.format("csv").load("/FileStore/tables/LTnodes.csv").rdd.map(line => line.toString)
events: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[563] at map at command-197980058855239:1

events.take(1)

Output:

Array([Point LT2019XXX,52aaa.18bbb,36ccc.21ddd])

val all_accidents = spark.read.format("csv").load("FileStore/tables/LTnodes.csv").toDF("PointId", "longitude", "latitude")
all_accidents: org.apache.spark.sql.DataFrame = [PointId: string, longitude: string ... 1 more field]
val rddSecondSet = events.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").split(',');new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
rddSecondSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[572] at mapPartitions at command-197980058855241:1

Implement Map Matching

val geoMatch = new GeoMatch(false, 256, 20, (-1, -1, -1, -1)) //n(=dimension of the Hilber curve) should be a power of 2. 
geoMatch: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,256,20.0,(-1,-1,-1,-1))
val resultRDD = geoMatch.spatialJoinKNN(rdd_first_set_intersections, rddSecondSet, 1, false)
resultRDD: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMPoint])] = MapPartitionsRDD[585] at mapPartitions at GeoMatch.scala:94

The output of the above command with IDs and locations anonymised is as follows:

+----------------------------------------+---------------------------------------------+
|k                                       |line                                         |
+----------------------------------------+---------------------------------------------+
|[Point LT20xyABCDEF, [521xxxx, 362yyyy]]|[[LineString:1254578sss, [521zzzz, 362zzzz]]]|
+----------------------------------------+---------------------------------------------+
only showing top 1 rows
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => (element._2.isEmpty)).count()
res19: Long = 8246
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).count()
res20: Long = 3743
val unmatched_events = resultRDD.filter(element => (element._2.isEmpty)).map(element => element._1.payload).toDF("id")

val second_set_second_round = unmatched_events.join(all_accidents, unmatched_events("id") === all_accidents("PointId")).drop("id").rdd.map(line => line.toString)

val rddSecondSetSecondRound = second_set_second_round.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").split(',');new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
unmatched_events: org.apache.spark.sql.DataFrame = [id: string]
second_set_second_round: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[599] at map at command-197980058855248:3
rddSecondSetSecondRound: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[600] at mapPartitions at command-197980058855248:5
val edges = spark.read.parquet("dbfs:/_checkpoint/edges_LT_100")
val vertices = spark.read.parquet("dbfs:/_checkpoint/vertices_LT_100").toDF("vertexId", "latitude", "longitude")
edges: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint]
vertices: org.apache.spark.sql.DataFrame = [vertexId: bigint, latitude: double ... 1 more field]
edges.show(1)
+--------+----------+
|     src|       dst|
+--------+----------+
|31451266|4397542060|
+--------+----------+
only showing top 1 row
val src_coordinates = edges.join(vertices,vertices("vertexId") === edges("src"), "left_outer").drop("vertexId").withColumnRenamed("latitude", "src_latitude").withColumnRenamed("longitude","src_longitude")
val edge_coordinates = src_coordinates.join(vertices,vertices("vertexId") === src_coordinates("dst")).drop("vertexId").withColumnRenamed("latitude", "dst_latitude").withColumnRenamed("longitude", "dst_longitude")
src_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 2 more fields]
edge_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 4 more fields]
import org.apache.spark.sql.functions.{concat, lit}
val concat_coordinates = edge_coordinates.select($"src",concat($"src_latitude",lit(" "),$"src_longitude").alias("src_coordinates"), $"dst",concat($"dst_latitude",lit(" "),$"dst_longitude").alias("dst_coordinates"))
import org.apache.spark.sql.functions.{concat, lit}
concat_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, src_coordinates: string ... 2 more fields]
concat_coordinates.show(1, false)
+----------+---------------------+--------+-------------------------------------+
|src       |src_coordinates      |dst     |dst_coordinates                      |
+----------+---------------------+--------+-------------------------------------+
|4095919448|54.6666894 25.1168508|31447217|54.666942600000006 25.115928200000003|
+----------+---------------------+--------+-------------------------------------+
only showing top 1 row
val linestring_coordinates = concat_coordinates.select($"src", $"dst",concat($"src_coordinates", lit(","), $"dst_coordinates").alias("list_of_coordinates"))
linestring_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 1 more field]
linestring_coordinates.show(1, false)
+----------+--------+-----------------------------------------------------------+
|src       |dst     |list_of_coordinates                                        |
+----------+--------+-----------------------------------------------------------+
|4095919448|31447217|54.6666894 25.1168508,54.666942600000006 25.115928200000003|
+----------+--------+-----------------------------------------------------------+
only showing top 1 row
val first = linestring_coordinates.select(concat(lit("LineString:"),$"src",lit("+"), $"dst").alias("LineString"),$"list_of_coordinates")
first: org.apache.spark.sql.DataFrame = [LineString: string, list_of_coordinates: string]
val first_rdd = first.rdd
first_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[675] at rdd at command-197980058855258:1
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
def project_to_meters(lon: String, lat: String): String = { 
    
    if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
  
    val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326))
    val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
    reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res31: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
first_rdd.take(1)
res32: Array[org.apache.spark.sql.Row] = Array([LineString:4095919448+31447217,54.6666894 25.1168508,54.666942600000006 25.115928200000003])
val ways_reprojected = first_rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(" ");val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});(parts(0).toString,arrCoords)})
ways_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[677] at map at command-197980058855264:1
ways_reprojected.take(1)
res33: Array[(String, Array[String])] = Array((LineString:4095919448+31447217,Array(5288428.785893 3608569.901562, 5288364.771866 3608585.141629)))
ways_reprojected.map(item => item._2(1)).take(1)
res34: Array[String] = Array(5288364.771866 3608585.141629)
val ways_unpacked = ways_reprojected.map(item => item._1.toString + "," + item._2(0).toString + "," + item._2(1).toString)
ways_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[679] at map at command-197980058855265:1
val rdd_first_set = ways_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMLineString(parts(0), arrCoords)}))
rdd_first_set: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[680] at mapPartitions at command-197980058855267:1
rdd_first_set.count()
res35: Long = 730237
def unpack_lat(str: String): String = {
        val lat = str.replaceAll(",","").replaceAll("\\[","").split(" ")(2)
        return lat
}
spark.udf.register("unpack_lat", unpack_lat(_:String): String)

def unpack_lon(str: String): String = {
        val lon = str.replaceAll(",","").replaceAll("\\[","").split(" ")(1)
        return lon
}
spark.udf.register("unpack_lon", unpack_lon(_:String): String)
unpack_lat: (str: String)String
unpack_lon: (str: String)String
res36: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
val geoMatchSecond = new GeoMatch(false, 256, 200, (-1, -1, -1, -1)) //n(=dimension of the Hilber curve) should be a power of 2. 
geoMatchSecond: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,256,200.0,(-1,-1,-1,-1))
val resultRDDsecond = geoMatchSecond.spatialJoinKNN(rdd_first_set, rddSecondSetSecondRound, 1, false)
resultRDDsecond: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMLineString])] = MapPartitionsRDD[693] at mapPartitions at GeoMatch.scala:94
resultRDDsecond.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => (element._2.isEmpty)).count()
res37: Long = 275

The next step is for each state to obtain the count

val res = resultRDDsecond.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty))
res: org.apache.spark.rdd.RDD[(String, scala.collection.mutable.ListBuffer[String])] = MapPartitionsRDD[697] at filter at command-197980058855277:1
val res_df = res.map(element => (element._1, element._2(0))).toDF("PointId", "State")
res_df: org.apache.spark.sql.DataFrame = [PointId: string, State: string]
val edge_counts = res_df.groupBy("State").count
edge_counts: org.apache.spark.sql.DataFrame = [State: string, count: bigint]

edge_counts.show(2, false)

Output:

+--------------------------------+-----+
|State                           |count|
+--------------------------------+-----+
|LineString:469327286+3637433937 |a    |
|LineString:2488853231+272553182 |b    |
|LineString:5074963276+2221962222|c    |
+--------------------------------+-----+
val res1 = resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty))
val res1_df = res1.map(element => (element._1, element._2(0))).toDF("PointId", "State")
val intersection_counts = res1_df.groupBy("State").count
res1: org.apache.spark.rdd.RDD[(String, scala.collection.mutable.ListBuffer[String])] = MapPartitionsRDD[706] at filter at command-197980058855280:1
res1_df: org.apache.spark.sql.DataFrame = [PointId: string, State: string]
intersection_counts: org.apache.spark.sql.DataFrame = [State: string, count: bigint]
import org.apache.spark.sql.functions._
val state_counts = edge_counts.union(intersection_counts)
state_counts.agg(sum("count")).show()
+----------+
|sum(count)|
+----------+
|     11714|
+----------+

import org.apache.spark.sql.functions._
state_counts: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [State: string, count: bigint]

Find the states with no matched events, assign count value equal to 0 and union them with the rest of the states_counts

val all_intersection_states = rdd_first_set_intersections.toDF("stateId", "coords").drop("coords")
val all_edge_states = rdd_first_set.toDF("stateId", "coords").drop("coords")
val all_states = all_intersection_states.union(all_edge_states)
all_states.count
all_intersection_states: org.apache.spark.sql.DataFrame = [stateId: string]
all_edge_states: org.apache.spark.sql.DataFrame = [stateId: string]
all_states: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [stateId: string]
res49: Long = 892562
val s1 = all_states.join(state_counts, all_states("stateId") === state_counts("State"), "left_outer").drop("State")
val s_final = s1.na.fill(0)
s1: org.apache.spark.sql.DataFrame = [stateId: string, count: bigint]
s_final: org.apache.spark.sql.DataFrame = [stateId: string, count: bigint]

Posterior - Conditional Distribution of State Counts for a Given Time Unit

Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and 
Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.expressions.Window
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.expressions.Window
import spark.implicits._
val different_dates = spark.read.parquet("/FileStore/tables/LTaccidents_id_date.parquet").toDF("id", "date").orderBy($"date".asc).select("date").rdd.map(element => element(0)).collect.toSet;
val distinct_dates  = different_dates.toList;
//the conditional distribution for each state given a time unit
def conditional_distribution(sample_date: String): org.apache.spark.sql.DataFrame = {
  import spark.implicits._
  
  val id_date = spark.read.parquet("/FileStore/tables/LTaccidents_id_date.parquet").toDF("id", "date")
  val matched_events = spark.read.parquet("dbfs:/_checkpoint/GeoMatch_G0").toDF("point", "state")
  val state_counts = matched_events.join(id_date, matched_events("point") === id_date("id"), "inner").drop("id").where($"date" === sample_date).groupBy("state").count()
  val global_count = state_counts.count.toFloat
  val state_space = spark.read.parquet("dbfs:/_checkpoint/StateSpaceInitialG0").toDF("initial_state","count").drop("count")
  val per_state_conditional_counts = state_space.join(state_counts, state_space("initial_state") === state_counts("state"), "left_outer").na.fill(0, Seq("count"))
  val number_of_states = state_space.count.toFloat
  val all_state_counts = per_state_conditional_counts.select("initial_state", "count").withColumn("prior", lit(1f/number_of_states)).orderBy($"count".asc)
  val df = all_state_counts.select(col("initial_state"), col("count").cast(FloatType), col("prior")).withColumn("global_count", lit(global_count))

  val posteriors = df.selectExpr("initial_state", "count + prior as posterior", "global_count")

  val posterior_means = posteriors.selectExpr("initial_state","posterior/(global_count + 1) as posterior_mean").orderBy($"posterior_mean".asc)
  posterior_means.createOrReplaceTempView("posterior_means")
  val df_1 = spark.sql("select initial_state, posterior_mean,"+" SUM(posterior_mean) over ( order by initial_state rows between unbounded preceding and current row ) cumulative_Sum " + " from posterior_means").toDF("initial_state", "posterior_mean", "cumulative_Sum")
  val df_2 = df_1.withColumn("prob_interval", lag($"cumulative_Sum", 1,0).over(Window.orderBy($"cumulative_Sum".asc))).select("initial_state", "prob_interval", "cumulative_Sum")

  val probability_intervals = df_2.selectExpr("initial_state", "(prob_interval, cumulative_Sum) as prob_interval")
  return probability_intervals
}
conditional_distribution: (sample_date: String)org.apache.spark.sql.DataFrame
//run only once per cluster 
var date = ""

for (date <- distinct_dates){
    val a = date.toString 
    var directory = "dbfs:/roadSafety"
    val probabilities = conditional_distribution(sample_date=a)
    directory += "_" + a 
    dbutils.fs.mkdirs(directory)
    probabilities.write.mode(SaveMode.Overwrite).parquet(directory + "_CD")
    probabilities.unpersist
    display(dbutils.fs.ls(directory))  
}
//The distribution of states independent of time 
def unconditional_distribution(): org.apache.spark.sql.DataFrame = {
  import spark.implicits._
  val state_space = spark.read.parquet("dbfs:/_checkpoint/StateSpaceInitialG0").toDF("initial_state","count").drop("count")
  val number_of_states = state_space.count.toFloat
  val priors = state_space.select("initial_state").withColumn("prior", lit(1f/number_of_states))
  priors.createOrReplaceTempView("priors")
  val df_1 = spark.sql("select initial_state, prior,"+" SUM(prior) over ( order by initial_state rows between unbounded preceding and current row ) cumulative_Sum " + " from priors").toDF("initial_state", "prior", "cumulative_Sum")
  val df_2 = df_1.withColumn("prob_interval", lag($"cumulative_Sum", 1,0).over(Window.orderBy($"cumulative_Sum".asc))).select("initial_state", "prob_interval", "cumulative_Sum")

  val probability_intervals = df_2.selectExpr("initial_state", "(prob_interval, cumulative_Sum) as prob_interval")
  return probability_intervals
}
unconditional_distribution: ()org.apache.spark.sql.DataFrame
unconditional_distribution.write.mode("overwrite").parquet("dbfs:/roadSafety_no_date_CD")

Simulating the Arrival Times of a NHPP by Inversion

Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by SENSMETRY through a Data Science Project Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and 
Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

import org.apache.spark.mllib.random._
import math.{log, floor, ceil}
import org.apache.spark.sql.functions._
import scala.util.{Try,Success,Failure}
import scala.util.control.Exception
import org.apache.spark.mllib.random.RandomRDDs._
import scala.collection.mutable.ArrayBuffer
import org.apache.spark.mllib.random._
import math.{log, floor, ceil}
import org.apache.spark.sql.functions._
import scala.util.{Try, Success, Failure}
import scala.util.control.Exception
import org.apache.spark.mllib.random.RandomRDDs._
import scala.collection.mutable.ArrayBuffer
  • Load the arrival times of the events from 1 realization of the process
val df = spark.read.parquet("FileStore/tables/LT_time_intervals").select("prev_date")
df: org.apache.spark.sql.DataFrame = [prev_date: bigint]
val ordered_T = df.collect().toArray :+ 1461
val generator = new UniformGenerator()
generator.setSeed(1234L) //set the seed for reproducability of results 
generator: org.apache.spark.mllib.random.UniformGenerator = org.apache.spark.mllib.random.UniformGenerator@2c211685
//initialization 
var i = 1
var u = generator.nextValue
var E = -math.log(1-u)
var T = 0.0
var m = 0.0
var width = 0.0
var samples = Array[Double]()
val n = 11720 //number of total observations 
val k = 1  //number of realisations
i: Int = 1
u: Double = 0.9499610869333489
E: Double = 2.9949543149092834
T: Double = 0.0
m: Double = 0.0
width: Double = 0.0
samples: Array[Double] = Array()
n: Int = 11720
k: Int = 1
while (E < n/k){
    m = math.floor(((n+1)*k/n)*E)
    width = ordered_T(m.toInt+1).toString.replaceAll("\\[", "").replaceAll("\\]", "").toDouble - ordered_T(m.toInt).toString.replaceAll("\\[", "").replaceAll("\\]", "").toDouble
    T = ordered_T(m.toInt).toString.replaceAll("\\[", "").replaceAll("\\]", "").toDouble + width * (((n+1)*k/n)*E - m).toDouble
    samples = samples :+ T
    i += 1 
    u = generator.nextValue
    E -= math.log(1-u)
}
val arrival_samples = sc.parallelize(samples)
val rounded_arrivals = arrival_samples.map(item => math.ceil(item))
val sample_df = rounded_arrivals.toDF("day").groupBy("day").count.orderBy($"day".asc)
arrival_samples: org.apache.spark.rdd.RDD[Double] = ParallelCollectionRDD[14874] at parallelize at command-1211269020742804:1
rounded_arrivals: org.apache.spark.rdd.RDD[Double] = MapPartitionsRDD[14875] at map at command-1211269020742804:2
sample_df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [day: double, count: bigint]
sample_df.count() //number of simulated days 
sample_df.select(sum("count")).show() //number of simulated events
+----------+
|sum(count)|
+----------+
|     11755|
+----------+
val times = sample_df 
val initialisation = sc.parallelize(Seq((" ", 0.0))).toDF("initial_state", "time_unit")
times: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [day: double, count: bigint]
initialisation: org.apache.spark.sql.DataFrame = [initial_state: string, time_unit: double]
val time_day_map = spark.sql("SELECT sequence(to_date('2017-01-01'), to_date('2020-12-31'), interval 1 day) as dates").select(explode($"dates").alias("day_of_year"), (monotonically_increasing_id + 1).alias("time_unit"))
val initial = ArrayBuffer[(String, Double)]()
val times_list = times.collect()
for (time <-  times_list){
    val day = time_day_map.filter(col("time_unit") === time(0)).select("day_of_year").collect()(0)(0).toString
    val count = time(1).asInstanceOf[Long]
    try {val conditional_distribution = spark.read.parquet("dbfs:/roadSafety_" + day + "_CD").select($"initial_state", $"prob_interval._1".alias("start"), $"prob_interval._2".alias("end"))
         val uniform_samples = uniformRDD(sc,count).toDF()
         val cross_samples_intervals = uniform_samples.crossJoin(conditional_distribution)
         val samples = cross_samples_intervals.filter("start < value").filter("end >= value").select("initial_state").cache()
         val location_time = samples.rdd.map(item => (item(0).toString, time(0).asInstanceOf[Double])).collect()
         initial ++= location_time
         samples.unpersist
         println(time(0).toString)
        }
    catch {
      case u: org.apache.spark.sql.AnalysisException => {
        println("Path does not exist " + day + ". Sampling independent of time")
        val conditional_distribution = spark.read.parquet("dbfs:/roadSafety_no_date_CD").select($"initial_state", $"prob_interval.prob_interval".alias("start"), $"prob_interval.cumulative_Sum".alias("end"))
        val uniform_samples = uniformRDD(sc,count).toDF()
        val cross_samples_intervals = uniform_samples.crossJoin(conditional_distribution)
        val samples = cross_samples_intervals.filter("start < value").filter("end >= value").select("initial_state").cache()
        val location_time = samples.rdd.map(item => (item(0).toString, time(0).asInstanceOf[Double])).collect()
        initial ++= location_time
        samples.unpersist
        println(time(0).toString)
        
      }
    }  
}
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
13.0
14.0
15.0
16.0
17.0
18.0
19.0
20.0
21.0
22.0
23.0
24.0
25.0
26.0
27.0
28.0
29.0
30.0
31.0
32.0
33.0
34.0
35.0
36.0
37.0
38.0
39.0
Path does not exist 2017-02-09. Sampling independent of time
40.0
41.0
42.0
43.0
44.0
45.0
46.0
47.0
48.0
49.0
50.0
51.0
52.0
53.0
54.0
55.0
56.0
57.0
58.0
59.0
60.0
61.0
62.0
63.0
64.0
65.0
66.0
67.0
68.0
69.0
70.0
71.0
72.0
73.0
74.0
75.0
76.0
77.0
78.0
79.0
80.0
81.0
82.0
83.0
84.0
85.0
86.0
87.0
88.0
89.0
90.0
91.0
92.0
93.0
94.0
95.0
96.0
97.0
98.0
99.0
100.0
101.0
102.0
103.0
104.0
105.0
106.0
107.0
108.0
109.0
110.0
112.0
113.0
114.0
115.0
116.0
117.0
119.0
120.0
121.0
122.0
123.0
124.0
125.0
126.0
127.0
128.0
129.0
130.0
131.0
132.0
133.0
134.0
135.0
136.0
137.0
138.0
139.0
140.0
141.0
142.0
143.0
144.0
145.0
146.0
147.0
148.0
149.0
150.0
151.0
152.0
153.0
154.0
155.0
156.0
157.0
158.0
159.0
160.0
161.0
162.0
163.0
164.0
165.0
166.0
167.0
168.0
169.0
170.0
171.0
172.0
173.0
174.0
175.0
176.0
177.0
178.0
179.0
180.0
181.0
182.0
183.0
184.0
185.0
186.0
187.0
188.0
189.0
190.0
191.0
192.0
193.0
194.0
195.0
196.0
197.0
198.0
199.0
200.0
201.0
202.0
203.0
204.0
205.0
206.0
207.0
208.0
209.0
210.0
211.0
212.0
213.0
214.0
215.0
216.0
217.0
218.0
219.0
220.0
221.0
222.0
223.0
224.0
225.0
226.0
227.0
228.0
229.0
230.0
231.0
232.0
233.0
234.0
235.0
236.0
237.0
238.0
239.0
240.0
241.0
242.0
243.0
244.0
245.0
246.0
247.0
248.0
249.0
250.0
251.0
252.0
253.0
254.0
255.0
256.0
257.0
258.0
259.0
260.0
261.0
262.0
263.0
264.0
265.0
266.0
267.0
268.0
269.0
270.0
271.0
272.0
273.0
274.0
275.0
276.0
277.0
278.0
279.0
280.0
281.0
282.0
283.0
284.0
285.0
286.0
287.0
288.0
289.0
290.0
291.0
292.0
293.0
294.0
295.0
296.0
297.0
298.0
299.0
300.0
301.0
302.0
303.0
304.0
305.0
306.0
307.0
308.0
309.0
310.0
311.0
313.0
314.0
315.0
316.0
317.0
318.0
319.0
320.0
321.0
322.0
323.0
324.0
325.0
326.0
327.0
328.0
329.0
330.0
331.0
332.0
333.0
334.0
335.0
336.0
337.0
338.0
339.0
340.0
341.0
342.0
343.0
344.0
345.0
346.0
347.0
348.0
349.0
350.0
351.0
352.0
353.0
354.0
355.0
356.0
357.0
358.0
359.0
360.0
361.0
362.0
363.0
364.0
365.0
366.0
367.0
368.0
369.0
370.0
371.0
372.0
373.0
374.0
375.0
376.0
377.0
378.0
379.0
380.0
381.0
382.0
383.0
384.0
385.0
386.0
387.0
388.0
389.0
390.0
391.0
392.0
393.0
394.0
395.0
396.0
397.0
398.0
399.0
400.0
401.0
402.0
403.0
404.0
405.0
406.0
407.0
408.0
409.0
410.0
411.0
413.0
414.0
415.0
417.0
418.0
419.0
420.0
421.0
422.0
423.0
424.0
425.0
426.0
427.0
428.0
429.0
430.0
431.0
432.0
433.0
434.0
435.0
436.0
437.0
438.0
439.0
440.0
441.0
442.0
443.0
444.0
445.0
446.0
447.0
448.0
449.0
450.0
451.0
452.0
453.0
454.0
455.0
456.0
458.0
459.0
460.0
461.0
462.0
463.0
464.0
465.0
466.0
467.0
468.0
469.0
470.0
471.0
472.0
473.0
474.0
475.0
476.0
477.0
478.0
479.0
480.0
481.0
482.0
483.0
484.0
485.0
486.0
487.0
488.0
489.0
490.0
491.0
492.0
493.0
494.0
495.0
496.0
497.0
498.0
499.0
500.0
501.0
502.0
503.0
504.0
505.0
506.0
507.0
508.0
509.0
510.0
511.0
512.0
513.0
514.0
515.0
516.0
517.0
518.0
519.0
520.0
521.0
522.0
523.0
524.0
525.0
526.0
527.0
528.0
529.0
530.0
531.0
532.0
533.0
534.0
535.0
536.0
537.0
538.0
539.0
540.0
541.0
542.0
543.0
544.0
545.0
546.0
547.0
548.0
549.0
550.0
551.0
552.0
553.0
554.0
555.0
556.0
557.0
558.0
559.0
560.0
561.0
562.0
563.0
564.0
565.0
566.0
567.0
568.0
569.0
570.0
571.0
572.0
573.0
574.0
575.0
576.0
577.0
578.0
579.0
580.0
581.0
582.0
583.0
584.0
585.0
586.0
587.0
588.0
589.0
590.0
591.0
592.0
593.0
594.0
595.0
596.0
597.0
598.0
599.0
600.0
601.0
602.0
603.0
604.0
605.0
606.0
607.0
608.0
609.0
610.0
611.0
612.0
613.0
614.0
615.0
616.0
617.0
618.0
619.0
620.0
621.0
622.0
623.0
624.0
625.0
626.0
627.0
628.0
629.0
630.0
631.0
632.0
633.0
634.0
635.0
636.0
637.0
638.0
639.0
640.0
641.0
642.0
643.0
644.0
645.0
646.0
647.0
648.0
649.0
650.0
651.0
652.0
653.0
654.0
655.0
656.0
657.0
658.0
659.0
660.0
661.0
662.0
663.0
664.0
665.0
666.0
667.0
668.0
669.0
670.0
671.0
672.0
673.0
674.0
675.0
676.0
677.0
678.0
679.0
680.0
681.0
682.0
683.0
684.0
685.0
686.0
687.0
688.0
689.0
690.0
691.0
692.0
693.0
694.0
695.0
696.0
697.0
698.0
699.0
700.0
701.0
702.0
703.0
704.0
705.0
706.0
707.0
708.0
709.0
710.0
711.0
712.0
713.0
714.0
715.0
716.0
717.0
718.0
719.0
720.0
721.0
722.0
723.0
724.0
725.0
726.0
727.0
728.0
729.0
730.0
731.0
732.0
733.0
734.0
735.0
737.0
738.0
739.0
740.0
741.0
742.0
743.0
744.0
745.0
746.0
747.0
748.0
749.0
750.0
751.0
752.0
753.0
754.0
755.0
756.0
757.0
758.0
759.0
760.0
761.0
762.0
763.0
765.0
766.0
767.0
768.0
769.0
770.0
772.0
773.0
774.0
775.0
776.0
777.0
778.0
779.0
780.0
781.0
782.0
783.0
784.0
785.0
786.0
787.0
788.0
789.0
790.0
791.0
792.0
793.0
794.0
795.0
796.0
797.0
798.0
799.0
801.0
802.0
803.0
804.0
806.0
807.0
808.0
809.0
810.0
811.0
812.0
813.0
814.0
815.0
816.0
817.0
818.0
819.0
820.0
821.0
822.0
823.0
824.0
825.0
826.0
827.0
828.0
829.0
830.0
831.0
832.0
833.0
834.0
835.0
836.0
837.0
838.0
839.0
840.0
841.0
842.0
843.0
844.0
845.0
846.0
847.0
848.0
849.0
850.0
851.0
852.0
853.0
854.0
855.0
856.0
857.0
858.0
859.0
860.0
861.0
862.0
863.0
864.0
865.0
866.0
867.0
868.0
869.0
870.0
871.0
872.0
873.0
874.0
875.0
876.0
877.0
878.0
879.0
880.0
881.0
882.0
883.0
884.0
885.0
886.0
887.0
888.0
889.0
890.0
891.0
892.0
893.0
894.0
895.0
896.0
897.0
898.0
899.0
900.0
901.0
902.0
903.0
904.0
905.0
906.0
907.0
908.0
909.0
910.0
911.0
912.0
913.0
914.0
915.0
916.0
917.0
918.0
919.0
920.0
921.0
922.0
923.0
924.0
925.0
926.0
927.0
928.0
929.0
930.0
931.0
932.0
933.0
934.0
935.0
936.0
937.0
938.0
939.0
940.0
941.0
942.0
943.0
944.0
945.0
946.0
947.0
948.0
949.0
950.0
951.0
952.0
953.0
954.0
955.0
956.0
957.0
958.0
959.0
960.0
961.0
962.0
963.0
964.0
965.0
966.0
967.0
968.0
969.0
970.0
971.0
972.0
973.0
974.0
975.0
976.0
977.0
978.0
979.0
980.0
981.0
982.0
983.0
984.0
985.0
986.0
987.0
988.0
989.0
990.0
991.0
992.0
993.0
994.0
995.0
996.0
997.0
998.0
999.0
1000.0
1001.0
1002.0
1003.0
1004.0
1005.0
1006.0
1007.0
1008.0
1009.0
1010.0
1011.0
1012.0
1013.0
1014.0
1015.0
1016.0
1017.0
1018.0
1019.0
1020.0
1021.0
1022.0
1023.0
1024.0
1025.0
1026.0
1027.0
1028.0
1029.0
1030.0
1031.0
1032.0
1033.0
1034.0
1035.0
1036.0
1037.0
1038.0
1039.0
1040.0
1041.0
1042.0
1043.0
1044.0
1045.0
1046.0
1047.0
1048.0
1049.0
1050.0
1051.0
1052.0
1053.0
1054.0
1055.0
1056.0
1057.0
1058.0
1059.0
1060.0
1061.0
1062.0
1063.0
1064.0
1065.0
1066.0
1067.0
1068.0
1069.0
1070.0
1071.0
1072.0
1073.0
1074.0
1075.0
1076.0
1077.0
1078.0
1079.0
1080.0
1081.0
1082.0
1083.0
1084.0
1085.0
1086.0
1087.0
1088.0
1089.0
1090.0
1091.0
1093.0
1094.0
1095.0
1096.0
1097.0
1098.0
1099.0
1100.0
1101.0
1102.0
1103.0
1104.0
1105.0
1106.0
1107.0
1108.0
1109.0
1110.0
1111.0
1112.0
1113.0
1114.0
1115.0
1116.0
1117.0
1118.0
1119.0
1120.0
1121.0
1122.0
1123.0
1124.0
1125.0
1126.0
1127.0
1128.0
1129.0
1130.0
1131.0
1132.0
1133.0
1134.0
1135.0
1136.0
1137.0
1138.0
1139.0
1140.0
1141.0
1142.0
1143.0
1144.0
1145.0
1146.0
1147.0
1148.0
1149.0
1150.0
1151.0
1152.0
1153.0
1154.0
1155.0
1156.0
1157.0
1158.0
1159.0
1160.0
1161.0
1162.0
1163.0
1164.0
1165.0
1166.0
1167.0
1168.0
1169.0
1170.0
1171.0
1172.0
1173.0
1174.0
1175.0
1176.0
1177.0
1178.0
1179.0
1180.0
1181.0
1183.0
1184.0
1185.0
1186.0
1187.0
1188.0
1189.0
1190.0
1191.0
1192.0
1194.0
1195.0
1196.0
1197.0
1198.0
1199.0
1200.0
1201.0
1202.0
1203.0
1204.0
1205.0
1206.0
1208.0
1209.0
1210.0
1211.0
1212.0
1213.0
1214.0
1215.0
1216.0
1217.0
1218.0
1219.0
1220.0
1221.0
1223.0
1224.0
1225.0
1226.0
1227.0
1228.0
1229.0
1230.0
1231.0
1232.0
1233.0
1234.0
1235.0
1236.0
1237.0
1238.0
1239.0
1241.0
1242.0
1243.0
1244.0
1245.0
1246.0
1247.0
1248.0
1249.0
1250.0
1251.0
1252.0
1253.0
1254.0
1255.0
1256.0
1257.0
1258.0
1259.0
1260.0
1261.0
1262.0
1263.0
1264.0
1265.0
1266.0
1267.0
1268.0
1269.0
1270.0
1271.0
1272.0
1273.0
1274.0
1275.0
1276.0
1277.0
1278.0
1279.0
1280.0
1281.0
1282.0
1283.0
1284.0
1285.0
1286.0
1287.0
1288.0
1289.0
1290.0
1291.0
1292.0
1293.0
1294.0
1295.0
1296.0
1297.0
1298.0
1299.0
1300.0
1301.0
1302.0
1303.0
1304.0
1305.0
1306.0
1307.0
1308.0
1309.0
1310.0
1311.0
1312.0
1313.0
1314.0
1315.0
1316.0
1317.0
1318.0
1319.0
1320.0
1321.0
1322.0
1323.0
1324.0
1325.0
1326.0
1327.0
1328.0
1329.0
1330.0
1331.0
1332.0
1333.0
1334.0
1335.0
1336.0
1337.0
1338.0
1339.0
1340.0
1341.0
1342.0
1343.0
1344.0
1345.0
1346.0
1347.0
1348.0
1349.0
1350.0
1351.0
1352.0
1353.0
1354.0
1355.0
1356.0
1357.0
1358.0
1359.0
1360.0
1361.0
1362.0
1363.0
1364.0
1365.0
1366.0
1367.0
1368.0
1369.0
1370.0
1371.0
1372.0
1373.0
1374.0
1375.0
1376.0
1377.0
1378.0
1379.0
1380.0
1381.0
1382.0
1383.0
1384.0
1385.0
1387.0
1388.0
1389.0
1390.0
1391.0
1392.0
1393.0
1394.0
1395.0
1396.0
1397.0
1398.0
1399.0
1400.0
1401.0
1402.0
1403.0
1404.0
1405.0
1406.0
1407.0
1408.0
1409.0
1410.0
1411.0
1412.0
time_day_map: org.apache.spark.sql.DataFrame = [day_of_year: date, time_unit: bigint]
initial: scala.collection.mutable.ArrayBuffer[(String, Double)] = ArrayBuffer((4368444509,1.0), (2424668863+4975677371,1.0), (1625682383,2.0), (2370300562+2370300566,2.0), (3205833256,2.0), (3205833256,2.0), (7731932519+33700189,2.0), (33140425+2320320623,2.0), (7731932519+33700189,2.0), (3205833256,2.0), (822048855,2.0), (33140425+2320320623,2.0), (6717810040+1092044002,3.0), (8850792014,3.0), (5387293279,3.0), (59600242+721036420,3.0), (2587005829,3.0), (291194903,4.0), (1584051936+429511771,4.0), (1584051936+429511771,4.0), (8242746297+2588236883,4.0), (730037987,4.0), (2762296139,4.0), (291194903,4.0), (730037987,4.0), (8242746297+2588236883,4.0), (2762296139,4.0), (798797373+2465504066,5.0), (798797373+2465504066,5.0), (1600441347,5.0), (3868060765+3868060765,5.0), (1600441347,5.0), (7879758835+7879758835,6.0), (470874618+1795329374,6.0), (1222823443,6.0), (470874618+1795329374,6.0), (3762326457,7.0), (1930969711,7.0), (2471815931,8.0), (2471815931,8.0), (319250909,8.0), (1163019324,9.0), (462706139+6900302992,9.0), (1119743298,9.0), (3051839354+6073948246,9.0), (1295828592,10.0), (1584065362,10.0), (1747606475,10.0), (8647346250+5230985201,10.0), (959451147,10.0), (2336797434+2336797436,10.0), (727331982,10.0), (3841856354,10.0), (1747606475,10.0), (727331982,10.0), (32871408+1699178230,10.0), (388222759+2619379829,10.0), (425442963,10.0), (2669381052,11.0), (7554572709,11.0), (1406196913+1507667881,11.0), (903290706+981434283,11.0), (7554572709,11.0), (1908755771+1908755753,11.0), (1165609337+420749684,12.0), (2680169742+8418869072,12.0), (1165869863+1662728370,12.0), (2680169742+8418869072,12.0), (1649361502,12.0), (1917119589+2557260234,12.0), (1165609337+420749684,12.0), (8315333427+983995722,12.0), (1183150115,12.0), (59613397,12.0), (2680169742+8418869072,12.0), (59613397,12.0), (7694675604+7694675604,13.0), (415545990,13.0), (1649797526+388227874,13.0), (253277187,13.0), (2588236884+8242746296,13.0), (305467163,13.0), (2203515240+168854751,13.0), (2588236884+8242746296,13.0), (3504903953+984408228,13.0), (1236407339,13.0), (3208381288,14.0), (894241356+7667433908,14.0), (792695963,14.0), (33735303,14.0), (2667703355+1386152713,14.0), (3208381288,14.0), (792695963,14.0), (894241356+7667433908,14.0), (279058228+73365133,15.0), (279058228+73365133,15.0), (279058228+73365133,15.0), (363672709,16.0), (363672709,16.0), (5798210145,16.0), (371713055,16.0), (32600448,16.0), (32083743,16.0), (32083743,16.0), (319247356,16.0), (967110682,16.0), (32325841,16.0), (3583899803,16.0), (60347482,17.0), (79672822+303583628,17.0), (32324664+4533351034,17.0), (427527701,17.0), (9363277628+9363277628,17.0), (1815575570,17.0), (4634922480,17.0), (5975667825+5975667825,17.0), (2677669585,17.0), (1020331765,18.0), (2320097163,18.0), (419237353,18.0), (419237353,18.0), (6905476274,18.0), (4450697716+4450697716,18.0), (5633388659,18.0), (1020331765,18.0), (2320097163,18.0), (8622036237+1580774004,18.0), (1192596587,19.0), (409755460,19.0), (363563559,19.0), (1022296148+410874210,19.0), (2264484031+1014608971,19.0), (363563559,19.0), (32845143,19.0), (2379377598,19.0), (32845143,19.0), (409755460,19.0), (2827597845,19.0), (409755460,19.0), (32845143,19.0), (32845143,19.0), (2574590933+38454926,19.0), (2505129295+2483169589,20.0), (364152927,20.0), (364152927,20.0), (7345759454,20.0), (837556761,20.0), (2044841002,20.0), (364152927,20.0), (364152927,20.0), (837556761,20.0), (4988458872,20.0), (461087005,21.0), (1639502771+1639502893,21.0), (739986370+410889520,21.0), (377656562+32598459,21.0), (847313826,21.0), (461087005,21.0), (1796292269+686546553,22.0), (1362685260+1362685270,22.0), (1944439773+1944439370,23.0), (8882257323+8882257323,23.0), (249070390,23.0), (8479396589,24.0), (6735091203,24.0), (2827586261,24.0), (8479396589,24.0), (538957975,24.0), (410606100+2434603229,25.0), (2252732116,25.0), (2204305386,26.0), (3114251682,26.0), (1329638683+419151551,26.0), (32324664,26.0), (32324664,26.0), (1299645458,26.0), (32845016,26.0), (7276369590+7276369590,26.0), (6042420027,26.0), (1820923762+1820923764,26.0), (36141432,26.0), (6042420027,26.0), (3114251682,26.0), (6042420027,26.0), (6042420027,26.0), (1329638683+419151551,26.0), (60734542,27.0), (1622367325,27.0), (264463729,27.0), (264463729,27.0), (264463729,27.0), (264463729,27.0), (32320314+31440369,27.0), (264463729,27.0), (31294780,27.0), (60734542,27.0), (32320314+31440369,27.0), (732200650,27.0), (2914930062,27.0), (2914930062,27.0), (504652363,27.0), (32320314+31440369,27.0), (1855893783+903265851,28.0), (32600451,28.0), (32600451,28.0), (99159029+3438113759,29.0), (4255116003+1314187228,29.0), (2866951734,30.0), (32603235,30.0), (32603235,30.0), (2378120812,30.0), (32600419,30.0), (2320320799+1001229890,31.0), (257179521+280830499,32.0), (364864586,32.0), (2033030280+2704534655,32.0), (364864586,32.0), (3841769599+3841709818,33.0), (707406699+419261237,34.0), (2619379831+2619379829,34.0), (3179603721,34.0), (3756289535+4460747443,34.0), (257178390,34.0), (5239539831+1124773938,34.0), (5239539831+1124773938,34.0), (365221734,34.0), (2265713612+2214926252,34.0), (3179603721,34.0), (181324716+430270954,34.0), (4267144632+1860705058,35.0), (919098478+504651594,35.0), (2119185927+9275274666,35.0), (4267144632+1860705058,35.0), (4267144632+1860705058,35.0), (2396386210,35.0), (667305142,35.0), (1740329002+8307613765,35.0), (1740329002+8307613765,35.0), (267528863+267528909,36.0), (6281971598,36.0), (4273563149,36.0), (1669465612+1669465948,37.0), (3270112620,38.0), (3709684883+3568781836,38.0), (5134670286+2938466972,38.0), (3270112620,38.0), (2211152928+2211152924,38.0), (2752159044,39.0), (2808766795,39.0), (2752159044,39.0), (5200261128,40.0), (1116078427+1124976131,41.0), (181324580,41.0), (2769155072,41.0), (5437941245,41.0), (5244364267,41.0), (1116078427+1124976131,41.0), (1116078427+1124976131,41.0), (698987043,41.0), (695491357,42.0), (8591908000+8591908000,42.0), (2845965238+6454345013,43.0), (2845965238+6454345013,43.0), (2845965238+6454345013,43.0), (1189263478,43.0), (2225754900,43.0), (33733769,44.0), (393806398,44.0), (33733769,44.0), (1134386488,44.0), (3791804903+1285072225,44.0), (33733769,44.0), (5929248812+5929248812,44.0), (442828950,45.0), (61103392,45.0), (60734594,46.0), (31436065,46.0), (8316810005,46.0), (8316810005,46.0), (181324716,46.0), (5037480027,46.0), (60734594,46.0), (60734594,46.0), (673605204,46.0), (449061049+2244985716,46.0), (2214843642+984408176,46.0), (60734594,46.0), (181324716,46.0), (2415765398,46.0), (2415765398,46.0), (1765993885,47.0), (6800010191+3451134345,47.0), (307434796+3777092701,47.0), (5605047102,48.0), (837556757,48.0), (316413715,48.0), (1800855075+3495077800,48.0), (32871384+32070360,48.0), (32083873,48.0), (837556757,48.0), (1837718516,49.0), (901004064,49.0), (1837718516,49.0), (8012758414+8012758414,49.0), (367664567+367665337,49.0), (901004064,49.0), (1837718516,49.0), (983988024+983995593,50.0), (3546166405+3546166405,50.0), (1791724822+510069981,50.0), (365354022,50.0), (365354022,50.0), (5582510225+1165869020,50.0), (363382358+5581876714,51.0), (738822019+738821988,51.0), (363382358+5581876714,51.0), (735564684,51.0), (738822019+738821988,51.0), (363382358+5581876714,51.0), (31451266,51.0), (291537629+6080716335,51.0), (7129328348+363422218,52.0), (8325783100,52.0), (2687132352,52.0), (2479010123+2479010123,52.0), (2687132352,52.0), (7129328348+363422218,52.0), (8325783100,52.0), (41640317,53.0), (41640317,53.0), (983977283,53.0), (969236862,53.0), (41640317,53.0), (7366187586,53.0), (983977283,53.0), (421555281,53.0), (468185991+1972218284,53.0), (421555281,53.0), (421555281,53.0), (2559562351+2473750225,53.0), (468185991+1972218284,53.0), (1362397315+2565174886,54.0), (2190745603+1534489849,55.0), (707948589+730037913,55.0), (3197923028,55.0), (2346338702,55.0), (4703986855,55.0), (1812189130,55.0), (477196849+1396536694,55.0), (2346338702,55.0), (2899475498+863177358,55.0), (2899475498+863177358,55.0), (3241251443,55.0), (750600011+504652363,56.0), (2379129581+2379129537,56.0), (452649792,56.0), (1408174637,56.0), (4047563197+4047563197,56.0), (1832790970,57.0), (621880643+1045983864,57.0), (2110411666+448680811,57.0), (33733769,57.0), (621880643+1045983864,57.0), (621880643+1045983864,57.0), (32603144,58.0), (1667994930,58.0), (32603144,58.0), (1667994930,58.0), (1667994930,58.0), (2687389613,58.0), (1667994930,58.0), (2209738067,59.0), (2095623986,59.0), (425438825,59.0), (2607606613,59.0), (1272406897+1272405098,59.0), (2095623986,59.0), (2209738067,59.0), (107845176,59.0), (276256802+32441462,59.0), (277163792+2511304183,59.0), (272241691+1356215270,59.0), (805115715,59.0), (2209738067,59.0), (107845176,59.0), (1834041179+1213838159,59.0), (3589633444+1973901879,59.0), (425438825,59.0), (3828294610+3828294610,60.0), (4982837044+8398875924,60.0), (33140649,60.0), (1041189696,60.0), (3828294610+3828294610,60.0), (2727216343,61.0), (372329733,61.0), (908644094+805802505,61.0), (905253285+905194309,61.0), (3115364670+3115364670,61.0), (372329733,61.0), (9457779045+9457779044,61.0), (2727216343,61.0), (908644094+805802505,61.0), (908644094+805802505,61.0), (905253285+905194309,61.0), (2121192109,61.0), (369440763,62.0), (7617598784,62.0), (761890856+1920545801,62.0), (32846319,62.0), (369440763,62.0), (761890856+1920545801,62.0), (6185695200+1254080956,63.0), (8558531217+419251719,63.0), (1582702256,63.0), (1582702256,63.0), (8558531217+419251719,63.0), (1043675144,63.0), (1043675144,63.0), (1632015035+1633097390,64.0), (430270932,64.0), (1367982085,64.0), (2522888081,64.0), (1367982085,64.0), (57174067,65.0), (1342605898+3316552071,65.0), (708727962,65.0), (1754989264,65.0), (708727962,65.0), (1022667594,65.0), (2392092692,65.0), (1754989264,65.0), (1342605898+3316552071,65.0), (32083738,65.0), (708727962,65.0), (272228692,65.0), (1754989264,65.0), (1342605898+3316552071,65.0), (6080716335+291537606,66.0), (998752776,66.0), (5781413491,66.0), (1283445497+5475700247,66.0), (6080716335+291537606,66.0), (5781413491,66.0), (6080716335+291537606,66.0), (2498101067,66.0), (9364003884+9364003880,66.0), (798796935+844163601,67.0), (3828041185,67.0), (6720021623,67.0), (452652743+1044734663,67.0), (430258456+430249899,68.0), (3543875617+452652743,68.0), (249508859+249070050,68.0), (249508859+249070050,68.0), (4020240830+774073633,68.0), (249508859+249070050,68.0), (58320579+410412998,69.0), (2869142866,69.0), (2869142866,69.0), (830394646,69.0), (257079497+2091836308,69.0), (8939171941,69.0), (2869142866,69.0), (32337907,69.0), (32337907,69.0), (58320579+410412998,69.0), (32337907,69.0), (476289852+332015274,69.0), (32337907,69.0), (32144610,70.0), (32592937+2896686179,70.0), (1678145942+1678145911,70.0), (32326858,71.0), (2430043373,71.0), (3780995164+3780995164,71.0), (2430043373,71.0), (1812425941+2188931612,71.0), (5693629241+2260725019,72.0), (1068390022,72.0), (1068390022,72.0), (1068390022,72.0), (500879800,72.0), (4187377477+1583369903,73.0), (3569307993+1916333696,73.0), (130181345+1567689612,73.0), (4187377477+1583369903,73.0), (1112055347,73.0), (2914930062,73.0), (3569307993+1916333696,73.0), (7194916352,73.0), (31440283,73.0), (1112055347,73.0), (1112055347,73.0), (2925616721+2925616755,73.0), (1371986386+1371986386,73.0), (1371986386+1371986386,73.0), (1371986386+1371986386,73.0), (3569307993+1916333696,73.0), (1812811235+1157299848,74.0), (790410920+2061146211,74.0), (33735983,74.0), (3823984678,75.0), (264255099,75.0), (995068123+1910308046,75.0), (870496846,75.0), (6193410234+4076638889,75.0), (316987996+316988454,75.0), (8464905466+4169756813,75.0), (2035645711,76.0), (2253085562,76.0), (364768683,76.0), (79672819,76.0), (1508768578,76.0), (2035645711,76.0), (2035645711,76.0), (2035645711,76.0), (364768683,76.0), (822774232,77.0), (822774232,77.0), (4695077711+36165476,77.0), (983988302+983983054,77.0), (8345609714+1451270220,77.0), (8850792014,78.0), (8850792014,78.0), (73365133,78.0), (1240365147,78.0), (48802835+33732285,78.0), (33140426,78.0), (1240365147,78.0), (7669605770+7669605770,78.0), (137659102+387188989,78.0), (7837166170+8261687225,78.0), (34826082+454487103,79.0), (31440402,79.0), (1935576369+1935576369,79.0), (1344671612,80.0), (1344671612,80.0), (1344671612,80.0), (268069623+32124581,81.0), (292201998,81.0), (60347306,81.0), (1011699241+1011699252,81.0), (1011699241+1011699252,81.0), (3208381292,81.0), (6003310256,81.0), (132213264,81.0), (268069623+32124581,81.0), (60347306,81.0), (1454083019+364842217,81.0), (1454083019+364842217,81.0), (2753247211+2753247199,82.0), (2753247211+2753247199,82.0), (3467475443,82.0), (2753247211+2753247199,82.0), (1333297073,82.0), (672383719+672383713,82.0), (247561146,82.0), (8150983190,82.0), (2215914977,83.0), (428001750+2580428654,83.0), (829195466+8274083855,83.0), (2215914977,83.0), (2215914977,83.0), (2215914977,83.0), (32603219,83.0), (1150522044,83.0), (1258016356+31294789,83.0), (719213330+419261248,84.0), (1886842224,84.0), (445000434,84.0), (2419955225+1780507291,85.0), (32137408,85.0), (7201404920+7201404949,85.0), (2419955225+1780507291,85.0), (3818529745+2319216250,86.0), (2794309227+4770052846,86.0), (4687602661,87.0), (4687602661,87.0), (1663156593,87.0), (677323819,88.0), (1343651192+7201404748,88.0), (32600388,88.0), (1343651192+7201404748,88.0), (7491706202+2610624579,88.0), (268076378+36431893,88.0), (53407297,89.0), (32320162,89.0), (2580428589,89.0), (437273409,89.0), (1343651192+7201404748,89.0), (710972236,89.0), (2273366398,90.0), (983983239,90.0), (983983239,90.0), (734222888+2448153895,90.0), (2273366398,90.0), (3220104621,91.0), (3220104621,91.0), (429634387,91.0), (2168559068,91.0), (393348631+393377026,92.0), (1316592312,92.0), (321615547,92.0), (1316592312,92.0), (332367071,92.0), (4426807408+32324670,92.0), (367664567+367665337,92.0), (8223018014,92.0), (393348631+393377026,92.0), (1846894455+1846894465,92.0), (332367071,92.0), (1316592312,92.0), (367664567+367665337,92.0), (393348631+393377026,92.0), (4426807408+32324670,92.0), (367664567+367665337,92.0), (393348631+393377026,92.0), (31440395,93.0), (5027680424+4397288350,93.0), (1219517801,93.0), (60459851,94.0), (1634421617+3124723371,94.0), (1652188887+1652188550,94.0), (60459851,94.0), (1634421617+3124723371,94.0), (60459851,94.0), (503888809+2194878863,94.0), (1029665281+798855663,94.0), (732084863,94.0), (3205837007,95.0), (1376843583+3777296495,95.0), (2221466124+56012660,95.0), (2221466124+56012660,95.0), (1029665177,96.0), (8622060711,96.0), (441289926,96.0), (821374299+2431196674,96.0), (821374299+2431196674,96.0), (2044963802,96.0), (8622060711,96.0), (1029665177,96.0), (1445306347+938392573,96.0), (1445306347+938392573,96.0), (1388570341,97.0), (32337915+4732541408,97.0), (32337915+4732541408,97.0), (1625538797+1622318296,97.0), (1422975465+5733349675,97.0), (428894865,97.0), (44584759,97.0), (3018600248,97.0), (1388570341,97.0), (1794957826,98.0), (1794957826,98.0), (419587810,98.0), (1794957826,98.0), (419587810,98.0), (1794957826,98.0), (130129965+1250621845,99.0), (2347394398,99.0), (410184858,100.0), (410184858,100.0), (473116304,100.0), (822051441,100.0), (727235504,100.0), (32144610,100.0), (2618230543+2618230543,100.0), (1054345176,100.0), (3691584014+2426837105,100.0), (6466793418+276853372,100.0), (1631514774+1633155543,101.0), (8325783100,101.0), (8010164409+1183150421,101.0), (4723634309+7201404784,102.0), (1212530639,102.0), (388221823,102.0), (4359032233,102.0), (4169756816,102.0), (5908445071+5908445071,102.0), (3730732737+8306548565,102.0), (3730732737+8306548565,102.0), (9241442144,102.0), (4359032233,102.0), (298331548+2707116048,103.0), (32083794,103.0), (340813015,103.0), (1015065017+285957360,103.0), (2376777237,103.0), (819584374,103.0), (32083794,103.0), (340813015,103.0), (2376777237,103.0), (2055871041,103.0), (2055871041,103.0), (4851260583+1769178439,103.0), (298331548+2707116048,103.0), (32083794,103.0), (819584374,103.0), (53829919,103.0), (4014341532+4014341532,104.0), (315548178,104.0), (32329191,104.0), (928079737+928080182,104.0), (429634387,104.0), (315548178,104.0), (315548178,104.0), (428001748,104.0), (258334235,105.0), (816440536+33184937,106.0), (1481655829,106.0), (2411610303+1991224693,106.0), (1481655829,106.0), (3796306736+3796306731,106.0), (499334549,107.0), (6152986966,107.0), (4297075907,108.0), (3390170283,108.0), (337174516,108.0), (337174516,108.0), (698986986+728464190,108.0), (337174516,108.0), (4416859006+4416859006,108.0), (502981961,108.0), (1669388836+2454216733,109.0), (728972240,109.0), (666943301,109.0), (538957968,109.0), (8708590595,109.0), (1525335651,109.0), (32603144,109.0), (1530707872+305220293,110.0), (5291441321+4248151334,110.0), (984408218,110.0), (2060253704,112.0), (510070404,112.0), (2060253704,112.0), (5041997631,112.0), (3954617988+3954617988,112.0), (430784582,112.0), (2338538872,112.0), (430784582,112.0), (2060253704,112.0), (430784582,112.0), (31294796,112.0), (510070404,112.0), (364842112+364842217,112.0), (40563567+40563576,112.0), (841522850,113.0), (1422198764,113.0), (880857400,113.0), (2322675730+2323171669,113.0), (1869130003+6328411651,114.0), (5231165084+32083794,114.0), (1869130003+6328411651,114.0), (34825447,114.0), (5329978798+4119457188,114.0), (7076643916+2088206020,115.0), (32336967+662024171,116.0), (824900014+290372103,116.0), (3755666372,117.0), (279058700+2885304357,117.0), (454487103+454487105,117.0), (4332324526,117.0), (7201262716,117.0), (279058700+2885304357,117.0), (7201262716,117.0), (4314404998+995040223,117.0), (4314404998+995040223,117.0), (8778774527,119.0), (31440405,119.0), (293601806,120.0), (427122850,120.0), (293601806,120.0), (8242746297,120.0), (8242746297,120.0), (4695046213+4695046213,120.0), (4695046213+4695046213,120.0), (2234858278+2234619485,120.0), (4695046213+4695046213,120.0), (2234858278+2234619485,120.0), (775866761,120.0), (1057743807+3319735019,121.0), (79672822,121.0), (671452932,121.0), (2249967174,121.0), (671452932,121.0), (5163798580+5163798581,122.0), (32337924+763634384,122.0), (421368977+1684694008,122.0), (1145771998+822048855,122.0), (924615848+303583180,122.0), (850776167,123.0), (761890866,123.0), (741684488+741684516,124.0), (1530349807,124.0), (1530349807,124.0), (2379979513+2364390338,124.0), (664158323,124.0), (1530349807,124.0), (33143924+8647346243,124.0), (2208884052,124.0), (664158323,124.0), (2914930062,124.0), (664158323,124.0), (1576250536,124.0), (2313116061,124.0), (3905293672+3905293672,124.0), (99171112+468098177,124.0), (2914930062,124.0), (2208884052,124.0), (2379979513+2364390338,124.0), (2914930062,124.0), (1747606452+371714002,124.0), (2033208931,124.0), (1530349807,124.0), (1892263414,124.0), (5834149249,124.0), (4021523900+4021523900,124.0), (32449296+32449294,125.0), (1508768578,125.0), (3817042384+3817042384,125.0), (1508768578,125.0), (638949148+867132773,125.0), (3817042384+3817042384,125.0), (1508768578,125.0), (32449296+32449294,125.0), (2637895619+2043449691,125.0), (2522426546+1833291602,125.0), (1001268109,125.0), (32449296+32449294,125.0), (1885204134+3025020911,125.0), (1001268109,125.0), (282776849+31447762,126.0), (1532706305,126.0), (59971295,126.0), (32324664,126.0), (2699991905+3946708557,126.0), (32324664,126.0), (32324664,126.0), (838630723,126.0), (421369053+1500341905,126.0), (1136532100+3603289690,126.0), (1857324410,126.0), (421369053+1500341905,126.0), (1857324410,126.0), (2417963673,126.0), (2417963673,126.0), (838630723,126.0), (59971295,126.0), (371712696+371712930,126.0), (428001748+837556746,127.0), (8990895368,127.0), (875843369+99159011,127.0), (8296159895,127.0), (8917108974,127.0), (428001748+837556746,127.0), (428001748+837556746,127.0), (673605204,128.0), (1612176500+2544330449,128.0), (1612176500+2544330449,128.0), (2071749918,128.0), (2071749918,128.0), (1612176500+2544330449,128.0), (411397954,128.0), (673605204,128.0), (1612176500+2544330449,128.0), (861783706+861782814,128.0), (673605204,128.0), (2071749918,128.0), (3617407575+32341473,129.0), (3617407575+32341473,129.0), (560477778+560477826,129.0), (1534411751+1534411469,129.0), (3617407575+32341473,129.0), (1534411751+1534411469,129.0), (1306730546,129.0), (4559143062,129.0), (3617407575+32341473,129.0), (1513211288+1513211288,130.0), (1144004920+1144004896,130.0), (2588955866,130.0), (36806444,130.0), (977638371+3592236210,130.0), (977638371+3592236210,130.0), (977638371+3592236210,130.0), (8299422859+5745389608,131.0), (33140425+2320320623,131.0), (2049428806,131.0), (997670026,131.0), (1201326476,131.0), (33140425+2320320623,131.0), (4319726742+4319726758,131.0), (9076976196+9076976196,131.0), (8299422859+5745389608,131.0), (9076976196+9076976196,131.0), (1597662295,131.0), (9076976196+9076976196,131.0), (997670026,131.0), (701189135,131.0), (2049428806,131.0), (41850576+78151900,131.0), (41850576+78151900,131.0), (33140425+2320320623,131.0), (2130812867,131.0), (2049428806,131.0), (41850576+78151900,131.0), (1201326476,131.0), (454487103+454487105,132.0), (7829481088,132.0), (162360518+250998985,132.0), (2868207489,132.0), (343467289+722953613,132.0), (343467289+722953613,132.0), (343467289+722953613,132.0), (2868207489,132.0), (32843212,132.0), (796013109,133.0), (2265022473+332270108,133.0), (5041997631,133.0), (560477894,133.0), (459884206,133.0), (560477894,133.0), (3728924143+864598775,133.0), (9276069144+3404472948,133.0), (509915318,133.0), (796013109,133.0), (509915318,133.0), (509915318,133.0), (5041997631,133.0), (459884206,133.0), (1017873799+1017873843,133.0), (459884206,133.0), (459884206,133.0), (9276069144+3404472948,133.0), (2098014328,134.0), (1479204328+1479204409,134.0), (2441490656+117053372,134.0), (461668700,134.0), (699790533+699790564,134.0), (1399469404,135.0), (1399469404,135.0), (1928971453,136.0), (1092778941+9626313335,136.0), (1928971453,136.0), (4487066323+1937844649,136.0), (364152927,136.0), (430771927,136.0), (769311044+1598659408,136.0), (1091364909+8245890249,136.0), (769311044+1598659408,136.0), (1092778941+9626313335,136.0), (2534694080+9227149677,136.0), (769311044+1598659408,136.0), (919597247+417492988,136.0), (821373861+2119428072,136.0), (420760611,136.0), (769311044+1598659408,136.0), (5698711201,136.0), (2520255231+2034323927,136.0), (181324580,136.0), (2520255231+2034323927,136.0), (7294603948+7294603949,136.0), (430771927,136.0), (79668785,137.0), (73419186,137.0), (31448343,137.0), (519329104,137.0), (504642878,137.0), (2351498507+2351498445,137.0), (2351498507+2351498445,137.0), (73419186,137.0), (2211708906,137.0), (372329733,137.0), (31448343,137.0), (31448343,137.0), (504642878,137.0), (818675978+560064854,138.0), (58304256+60002230,138.0), (722953640,138.0), (73419192,138.0), (73419192,138.0), (33732301,138.0), (722953640,138.0), (33732301,138.0), (1104466256+474708878,138.0), (1152426544,138.0), (73419192,138.0), (58304256+60002230,138.0), (1152426544,138.0), (33732301,138.0), (2378979201+1744549002,139.0), (2034521571+2034521558,139.0), (2378979201+1744549002,139.0), (2608166052,139.0), (8322485005+4406661355,139.0), (2608166052,139.0), (8246161251,140.0), (3524132662+3524132656,140.0), (8246161251,140.0), (287914012,140.0))
times_list: Array[org.apache.spark.sql.Row] = Array([1.0,2], [2.0,10], [3.0,5], [4.0,10], [5.0,5], [6.0,4], [7.0,2], [8.0,3], [9.0,4], [10.0,13], [11.0,6], [12.0,12], [13.0,10], [14.0,8], [15.0,3], [16.0,11], [17.0,9], [18.0,10], [19.0,15], [20.0,10], [21.0,6], [22.0,2], [23.0,3], [24.0,5], [25.0,2], [26.0,16], [27.0,16], [28.0,3], [29.0,2], [30.0,5], [31.0,1], [32.0,4], [33.0,1], [34.0,11], [35.0,9], [36.0,3], [37.0,1], [38.0,5], [39.0,3], [40.0,1], [41.0,8], [42.0,2], [43.0,5], [44.0,7], [45.0,2], [46.0,15], [47.0,3], [48.0,7], [49.0,7], [50.0,6], [51.0,8], [52.0,7], [53.0,13], [54.0,1], [55.0,11], [56.0,5], [57.0,6], [58.0,7], [59.0,17], [60.0,5], [61.0,12], [62.0,6], [63.0,7], [64.0,5], [65.0,14], [66.0,9], [67.0,4], [68.0,6], [69.0,13], [70.0,3], [71.0,5], [72.0,5], [73.0,16], [74.0,3], [75.0,7], [76.0,9], [77.0,5], [78.0,10], [79.0,3], [80.0,3], [81.0,12], [82.0,8], [83.0,9], [84.0,3], [85.0,4], [86.0,2], [87.0,3], [88.0,6], [89.0,6], [90.0,5], [91.0,4], [92.0,17], [93.0,3], [94.0,9], [95.0,4], [96.0,10], [97.0,9], [98.0,6], [99.0,2], [100.0,10], [101.0,3], [102.0,10], [103.0,16], [104.0,8], [105.0,1], [106.0,5], [107.0,2], [108.0,8], [109.0,7], [110.0,3], [112.0,14], [113.0,4], [114.0,5], [115.0,1], [116.0,2], [117.0,9], [119.0,2], [120.0,11], [121.0,5], [122.0,5], [123.0,2], [124.0,25], [125.0,14], [126.0,18], [127.0,7], [128.0,12], [129.0,9], [130.0,7], [131.0,22], [132.0,9], [133.0,18], [134.0,5], [135.0,2], [136.0,22], [137.0,13], [138.0,14], [139.0,6], [140.0,16], [141.0,5], [142.0,11], [143.0,20], [144.0,11], [145.0,22], [146.0,5], [147.0,28], [148.0,8], [149.0,8], [150.0,10], [151.0,11], [152.0,14], [153.0,10], [154.0,9], [155.0,4], [156.0,10], [157.0,7], [158.0,8], [159.0,7], [160.0,14], [161.0,11], [162.0,9], [163.0,8], [164.0,14], [165.0,7], [166.0,19], [167.0,5], [168.0,12], [169.0,12], [170.0,10], [171.0,4], [172.0,11], [173.0,4], [174.0,16], [175.0,4], [176.0,5], [177.0,12], [178.0,8], [179.0,1], [180.0,5], [181.0,7], [182.0,6], [183.0,8], [184.0,8], [185.0,13], [186.0,8], [187.0,8], [188.0,10], [189.0,5], [190.0,7], [191.0,7], [192.0,7], [193.0,2], [194.0,14], [195.0,14], [196.0,5], [197.0,17], [198.0,13], [199.0,8], [200.0,4], [201.0,10], [202.0,9], [203.0,6], [204.0,6], [205.0,9], [206.0,18], [207.0,11], [208.0,6], [209.0,5], [210.0,13], [211.0,5], [212.0,6], [213.0,10], [214.0,8], [215.0,14], [216.0,7], [217.0,13], [218.0,10], [219.0,10], [220.0,15], [221.0,7], [222.0,14], [223.0,7], [224.0,16], [225.0,6], [226.0,18], [227.0,5], [228.0,13], [229.0,15], [230.0,14], [231.0,13], [232.0,5], [233.0,7], [234.0,3], [235.0,5], [236.0,3], [237.0,8], [238.0,5], [239.0,17], [240.0,16], [241.0,8], [242.0,11], [243.0,13], [244.0,10], [245.0,5], [246.0,5], [247.0,11], [248.0,8], [249.0,9], [250.0,9], [251.0,13], [252.0,18], [253.0,5], [254.0,13], [255.0,19], [256.0,10], [257.0,4], [258.0,9], [259.0,5], [260.0,3], [261.0,9], [262.0,9], [263.0,15], [264.0,15], [265.0,13], [266.0,5], [267.0,18], [268.0,13], [269.0,8], [270.0,20], [271.0,11], [272.0,7], [273.0,6], [274.0,2], [275.0,10], [276.0,6], [277.0,9], [278.0,9], [279.0,14], [280.0,5], [281.0,6], [282.0,5], [283.0,18], [284.0,11], [285.0,6], [286.0,14], [287.0,5], [288.0,7], [289.0,6], [290.0,8], [291.0,6], [292.0,9], [293.0,3], [294.0,21], [295.0,6], [296.0,13], [297.0,5], [298.0,17], [299.0,13], [300.0,8], [301.0,3], [302.0,14], [303.0,7], [304.0,13], [305.0,7], [306.0,8], [307.0,4], [308.0,1], [309.0,4], [310.0,10], [311.0,4], [313.0,3], [314.0,19], [315.0,9], [316.0,10], [317.0,6], [318.0,9], [319.0,12], [320.0,17], [321.0,4], [322.0,7], [323.0,1], [324.0,9], [325.0,6], [326.0,11], [327.0,4], [328.0,12], [329.0,7], [330.0,5], [331.0,4], [332.0,4], [333.0,15], [334.0,9], [335.0,7], [336.0,9], [337.0,4], [338.0,10], [339.0,17], [340.0,8], [341.0,16], [342.0,10], [343.0,5], [344.0,1], [345.0,8], [346.0,4], [347.0,15], [348.0,12], [349.0,7], [350.0,3], [351.0,3], [352.0,6], [353.0,15], [354.0,12], [355.0,20], [356.0,21], [357.0,10], [358.0,11], [359.0,11], [360.0,8], [361.0,13], [362.0,1], [363.0,17], [364.0,6], [365.0,1], [366.0,10], [367.0,6], [368.0,6], [369.0,9], [370.0,17], [371.0,2], [372.0,4], [373.0,11], [374.0,7], [375.0,6], [376.0,5], [377.0,8], [378.0,1], [379.0,5], [380.0,8], [381.0,5], [382.0,13], [383.0,10], [384.0,9], [385.0,6], [386.0,1], [387.0,4], [388.0,6], [389.0,5], [390.0,2], [391.0,9], [392.0,10], [393.0,3], [394.0,23], [395.0,8], [396.0,3], [397.0,12], [398.0,6], [399.0,1], [400.0,1], [401.0,11], [402.0,6], [403.0,7], [404.0,3], [405.0,6], [406.0,3], [407.0,10], [408.0,7], [409.0,9], [410.0,2], [411.0,4], [413.0,9], [414.0,2], [415.0,4], [417.0,8], [418.0,3], [419.0,6], [420.0,3], [421.0,2], [422.0,6], [423.0,6], [424.0,2], [425.0,6], [426.0,3], [427.0,9], [428.0,4], [429.0,4], [430.0,8], [431.0,7], [432.0,16], [433.0,5], [434.0,6], [435.0,3], [436.0,8], [437.0,10], [438.0,4], [439.0,6], [440.0,4], [441.0,6], [442.0,2], [443.0,6], [444.0,6], [445.0,3], [446.0,10], [447.0,6], [448.0,3], [449.0,1], [450.0,2], [451.0,7], [452.0,12], [453.0,3], [454.0,3], [455.0,4], [456.0,3], [458.0,9], [459.0,4], [460.0,7], [461.0,9], [462.0,7], [463.0,8], [464.0,14], [465.0,10], [466.0,9], [467.0,10], [468.0,8], [469.0,7], [470.0,1], [471.0,3], [472.0,7], [473.0,2], [474.0,8], [475.0,7], [476.0,8], [477.0,8], [478.0,15], [479.0,4], [480.0,7], [481.0,5], [482.0,8], [483.0,4], [484.0,3], [485.0,8], [486.0,4], [487.0,10], [488.0,7], [489.0,6], [490.0,18], [491.0,12], [492.0,10], [493.0,21], [494.0,14], [495.0,7], [496.0,23], [497.0,8], [498.0,8], [499.0,6], [500.0,12], [501.0,6], [502.0,2], [503.0,8], [504.0,16], [505.0,4], [506.0,20], [507.0,7], [508.0,12], [509.0,18], [510.0,18], [511.0,9], [512.0,7], [513.0,8], [514.0,13], [515.0,17], [516.0,15], [517.0,10], [518.0,9], [519.0,6], [520.0,6], [521.0,11], [522.0,3], [523.0,8], [524.0,19], [525.0,20], [526.0,12], [527.0,5], [528.0,3], [529.0,10], [530.0,11], [531.0,15], [532.0,16], [533.0,11], [534.0,11], [535.0,5], [536.0,7], [537.0,6], [538.0,6], [539.0,12], [540.0,6], [541.0,4], [542.0,9], [543.0,6], [544.0,6], [545.0,8], [546.0,1], [547.0,3], [548.0,11], [549.0,7], [550.0,6], [551.0,8], [552.0,4], [553.0,11], [554.0,6], [555.0,12], [556.0,5], [557.0,6], [558.0,3], [559.0,4], [560.0,13], [561.0,5], [562.0,16], [563.0,8], [564.0,9], [565.0,10], [566.0,12], [567.0,8], [568.0,4], [569.0,7], [570.0,21], [571.0,7], [572.0,16], [573.0,14], [574.0,10], [575.0,5], [576.0,2], [577.0,5], [578.0,8], [579.0,8], [580.0,22], [581.0,9], [582.0,12], [583.0,9], [584.0,7], [585.0,9], [586.0,12], [587.0,12], [588.0,9], [589.0,9], [590.0,9], [591.0,10], [592.0,4], [593.0,8], [594.0,10], [595.0,14], [596.0,5], [597.0,11], [598.0,6], [599.0,9], [600.0,9], [601.0,17], [602.0,5], [603.0,14], [604.0,6], [605.0,6], [606.0,8], [607.0,12], [608.0,13], [609.0,7], [610.0,10], [611.0,11], [612.0,6], [613.0,18], [614.0,10], [615.0,11], [616.0,8], [617.0,2], [618.0,7], [619.0,11], [620.0,12], [621.0,7], [622.0,15], [623.0,1], [624.0,7], [625.0,5], [626.0,18], [627.0,16], [628.0,17], [629.0,13], [630.0,5], [631.0,5], [632.0,8], [633.0,3], [634.0,9], [635.0,14], [636.0,11], [637.0,8], [638.0,2], [639.0,4], [640.0,8], [641.0,14], [642.0,5], [643.0,8], [644.0,8], [645.0,6], [646.0,9], [647.0,12], [648.0,7], [649.0,6], [650.0,10], [651.0,9], [652.0,7], [653.0,8], [654.0,8], [655.0,10], [656.0,2], [657.0,10], [658.0,7], [659.0,2], [660.0,12], [661.0,8], [662.0,20], [663.0,11], [664.0,7], [665.0,7], [666.0,3], [667.0,6], [668.0,5], [669.0,13], [670.0,6], [671.0,9], [672.0,4], [673.0,5], [674.0,5], [675.0,1], [676.0,2], [677.0,2], [678.0,10], [679.0,11], [680.0,5], [681.0,10], [682.0,13], [683.0,9], [684.0,13], [685.0,5], [686.0,3], [687.0,3], [688.0,6], [689.0,13], [690.0,8], [691.0,7], [692.0,6], [693.0,11], [694.0,2], [695.0,11], [696.0,10], [697.0,8], [698.0,12], [699.0,7], [700.0,17], [701.0,4], [702.0,18], [703.0,16], [704.0,9], [705.0,14], [706.0,11], [707.0,10], [708.0,3], [709.0,12], [710.0,4], [711.0,8], [712.0,13], [713.0,9], [714.0,4], [715.0,14], [716.0,11], [717.0,10], [718.0,2], [719.0,4], [720.0,15], [721.0,8], [722.0,8], [723.0,7], [724.0,3], [725.0,9], [726.0,4], [727.0,5], [728.0,2], [729.0,2], [730.0,2], [731.0,8], [732.0,6], [733.0,7], [734.0,7], [735.0,7], [737.0,6], [738.0,6], [739.0,6], [740.0,6], [741.0,14], [742.0,8], [743.0,2], [744.0,23], [745.0,6], [746.0,4], [747.0,8], [748.0,11], [749.0,2], [750.0,8], [751.0,5], [752.0,4], [753.0,5], [754.0,11], [755.0,9], [756.0,6], [757.0,9], [758.0,16], [759.0,1], [760.0,5], [761.0,5], [762.0,16], [763.0,7], [765.0,8], [766.0,3], [767.0,8], [768.0,10], [769.0,7], [770.0,4], [772.0,7], [773.0,8], [774.0,9], [775.0,2], [776.0,7], [777.0,11], [778.0,7], [779.0,7], [780.0,11], [781.0,4], [782.0,3], [783.0,10], [784.0,5], [785.0,2], [786.0,7], [787.0,6], [788.0,4], [789.0,7], [790.0,5], [791.0,6], [792.0,4], [793.0,7], [794.0,1], [795.0,7], [796.0,6], [797.0,8], [798.0,6], [799.0,5], [801.0,11], [802.0,5], [803.0,5], [804.0,4], [806.0,7], [807.0,5], [808.0,10], [809.0,5], [810.0,4], [811.0,8], [812.0,9], [813.0,4], [814.0,10], [815.0,9], [816.0,6], [817.0,4], [818.0,7], [819.0,8], [820.0,4], [821.0,5], [822.0,11], [823.0,14], [824.0,11], [825.0,10], [826.0,6], [827.0,8], [828.0,6], [829.0,7], [830.0,2], [831.0,4], [832.0,11], [833.0,9], [834.0,6], [835.0,26], [836.0,5], [837.0,7], [838.0,14], [839.0,19], [840.0,6], [841.0,7], [842.0,11], [843.0,5], [844.0,11], [845.0,4], [846.0,18], [847.0,6], [848.0,6], [849.0,2], [850.0,14], [851.0,19], [852.0,6], [853.0,10], [854.0,5], [855.0,3], [856.0,14], [857.0,5], [858.0,8], [859.0,6], [860.0,2], [861.0,14], [862.0,7], [863.0,8], [864.0,10], [865.0,9], [866.0,3], [867.0,3], [868.0,4], [869.0,9], [870.0,9], [871.0,12], [872.0,16], [873.0,19], [874.0,7], [875.0,8], [876.0,5], [877.0,4], [878.0,7], [879.0,9], [880.0,18], [881.0,10], [882.0,12], [883.0,5], [884.0,11], [885.0,11], [886.0,8], [887.0,10], [888.0,10], [889.0,9], [890.0,10], [891.0,17], [892.0,14], [893.0,6], [894.0,13], [895.0,13], [896.0,14], [897.0,5], [898.0,9], [899.0,9], [900.0,12], [901.0,10], [902.0,8], [903.0,17], [904.0,13], [905.0,10], [906.0,9], [907.0,8], [908.0,14], [909.0,8], [910.0,16], [911.0,12], [912.0,14], [913.0,17], [914.0,4], [915.0,4], [916.0,12], [917.0,7], [918.0,8], [919.0,4], [920.0,8], [921.0,11], [922.0,10], [923.0,12], [924.0,7], [925.0,14], [926.0,16], [927.0,11], [928.0,8], [929.0,6], [930.0,5], [931.0,9], [932.0,6], [933.0,12], [934.0,16], [935.0,12], [936.0,11], [937.0,11], [938.0,20], [939.0,12], [940.0,12], [941.0,8], [942.0,6], [943.0,8], [944.0,3], [945.0,21], [946.0,10], [947.0,11], [948.0,12], [949.0,14], [950.0,4], [951.0,7], [952.0,4], [953.0,7], [954.0,6], [955.0,10], [956.0,8], [957.0,4], [958.0,8], [959.0,13], [960.0,12], [961.0,13], [962.0,8], [963.0,5], [964.0,18], [965.0,22], [966.0,15], [967.0,5], [968.0,10], [969.0,13], [970.0,12], [971.0,8], [972.0,6], [973.0,5], [974.0,16], [975.0,1], [976.0,17], [977.0,13], [978.0,10], [979.0,9], [980.0,15], [981.0,11], [982.0,10], [983.0,13], [984.0,15], [985.0,13], [986.0,15], [987.0,18], [988.0,5], [989.0,10], [990.0,7], [991.0,8], [992.0,9], [993.0,12], [994.0,7], [995.0,9], [996.0,10], [997.0,4], [998.0,2], [999.0,6], [1000.0,14], [1001.0,4], [1002.0,2], [1003.0,7], [1004.0,7], [1005.0,8], [1006.0,10], [1007.0,18], [1008.0,13], [1009.0,8], [1010.0,7], [1011.0,10])
val location_simulation = initial.toDF("simulated_location", "arrival_time")
location_simulation: org.apache.spark.sql.DataFrame = [simulated_location: string, arrival_time: double]
location_simulation.show(3)
+--------------------+------------+
|  simulated_location|arrival_time|
+--------------------+------------+
|          4368444509|         1.0|
|2424668863+497567...|         1.0|
|          1625682383|         2.0|
+--------------------+------------+
only showing top 3 rows

ScaDaMaLe Course site and book

Transformation of coordinates using Arcgis Runtime library

Virginia Jimenez Mohedano (LinkedIn), Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by UAB SENSMETRY through a Data Science Thesis Internship 
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano 
and Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.types._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._ 
import org.apache.spark.sql._ 
import scala.util.matching.Regex

import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.types._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql._
import scala.util.matching.Regex
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._

Arcgis runtime library allows for coordinates transformations.

  • Download arcgis runtime from https://developers.arcgis.com/downloads/#java (.tgz)

  • Install jar (from the "libs" folder) in the cluster.

The version downloaded was 100.4.0

dbutils.fs.mkdirs("dbfs:/arcGISRuntime/")
res1: Boolean = true
tar zxvf /dbfs/arcGISRuntime/arcgis_runtime_sdk_java_100_4_0.tgz -C /dbfs/arcGISRuntime
arcgis-runtime-sdk-java-100.4.0/
arcgis-runtime-sdk-java-100.4.0/LICENSE.txt
arcgis-runtime-sdk-java-100.4.0/README.txt
arcgis-runtime-sdk-java-100.4.0/RELEASE-NOTES.txt
arcgis-runtime-sdk-java-100.4.0/jniLibs/
arcgis-runtime-sdk-java-100.4.0/jniLibs/OSX64/
arcgis-runtime-sdk-java-100.4.0/jniLibs/OSX64/libruntimecore.dylib
arcgis-runtime-sdk-java-100.4.0/jniLibs/OSX64/libruntimecore_java.dylib
arcgis-runtime-sdk-java-100.4.0/jniLibs/LX64/
arcgis-runtime-sdk-java-100.4.0/jniLibs/LX64/libruntimecore_java.so
arcgis-runtime-sdk-java-100.4.0/jniLibs/LX64/libruntimecore.so
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_rxf_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_ground_overlay_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/post_process_viewshed_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_outline_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/magnifier_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_pattern_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_r16u_test_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_modifiers_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_modifiers_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_phongshadow_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/atmosphere_textured_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/measure_line_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/text_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/point_billboard_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_background_pattern_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_draped_graphics_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_ground_overlay_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/point_billboard_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_halo_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_pattern_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/atmosphere_textured_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_area_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/screen_overlay_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_phongshadow_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_rxs_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_rxu_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/magnifier_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_pattern_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phong_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_sdf_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_pattern_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_solid_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_marker_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/screen_image_renderer_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/simple_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/screen_overlay_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/texture_draw_instanced_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_render_target_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/skybox_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/frustum_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_sdf_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_pattern_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_stencil_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_halo_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/frustum_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_marker_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_halo_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_overlay_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_halo_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_marker_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/simple_grid_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_stencil_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_area_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_r16s_test_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/offscreen_buffer_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_background_solid_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_render_target_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/point_billboard_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_r32f_test_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_solid_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phong_world_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_halo_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_outline_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_area_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_b8g8r8a8un_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/texture_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_draped_graphics_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phongshadow_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_coor_to_tex_coor_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/simple_grid_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_overlay_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_world_instance_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_solid_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phong_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_overlay_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_circle_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_phongshadow_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/offscreen_buffer_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/atmosphere_accurate_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phong_world_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_outline_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/post_process_quad_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/simple_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phongshadow_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_phong_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_world_instance_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_marker_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_outline_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_solid_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_halo_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_outline_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_outline_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/atmosphere_accurate_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/texture_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_tile_info_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_background_pattern_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/skybox_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_pattern_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_area_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phongshadow_world_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_outline_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_sdf_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_halo_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/point_billboard_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/star_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/text_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phongshadow_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_circle_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_coor_to_tex_coor_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phongshadow_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_b8g8r8a8un_custom_filter_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/image_renderer_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_pattern_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_tile_info_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_pattern_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_halo_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/star_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_b8g8r8a8un_adv_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_background_solid_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_outline_overlay_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_outline_overlay_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_sdf_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phongshadow_world_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/texture_draw_instanced_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_phong_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/screen_image_renderer_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/measure_line_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_phongshadow_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/image_renderer_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/msvcp140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/runtimecore.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/concrt140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/runtimecore_java.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/vcruntime140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/vccorlib140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/msvcp140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/runtimecore.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/concrt140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/runtimecore_java.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/vcruntime140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/vccorlib140.dll
arcgis-runtime-sdk-java-100.4.0/legal/
arcgis-runtime-sdk-java-100.4.0/legal/third-party-software-acknowledgements.pdf
arcgis-runtime-sdk-java-100.4.0/legal/EULA.pdf
arcgis-runtime-sdk-java-100.4.0/legal/Copyright_and_Trademarks.pdf
arcgis-runtime-sdk-java-100.4.0/libs/
arcgis-runtime-sdk-java-100.4.0/libs/arcgis-java-api-javadoc.jar
arcgis-runtime-sdk-java-100.4.0/libs/commons-logging-1.2.jar
arcgis-runtime-sdk-java-100.4.0/libs/commons-codec-1.11.jar
arcgis-runtime-sdk-java-100.4.0/libs/arcgis-java-api.jar
arcgis-runtime-sdk-java-100.4.0/libs/gson-2.8.5.jar
arcgis-runtime-sdk-java-100.4.0/resources/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/alaska.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/icegrid2004.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/icegrid2004.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stlrnc.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stpaul.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stgeorge.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/prvi.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/ICEGRID93.LOS
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/prvi.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stgeorge.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/hawaii.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stpaul.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/ICEGRID93.LAS
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/conus.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stlrnc.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/hawaii.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/conus.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/alaska.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/hvtdefaults.json
arcgis-runtime-sdk-java-100.4.0/resources/pedata/vertical/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/vertical/egm/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/vertical/egm/egm96.grd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gtdefaults.json
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/newzealand/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/newzealand/nzgd2kgrid0005.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/SAD69_003.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/CA7072_003.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/SAD96_003.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/CA61_003.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/austria/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/austria/AT_GIS_GRID.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/japan/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/japan/tky2jgd.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/japan/touhokutaiheiyouoki2011.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/france/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/france/rgf93_ntf.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/france/RGNC1991_IGN72GrandeTerre.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/france/RGNC1991_NEA74Noumea.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/ireland/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/ireland/tm75_etrs89.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/uk/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/uk/osgb36_xrail84.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/uk/OSTN02_NTv2.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/uk/OSTN15_NTv2.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/netherlands/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/netherlands/rdtrans2008.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/portugal/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/portugal/D73_ETRS89_geo.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/portugal/DLX_ETRS89_geo.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/switzerland/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/switzerland/CHENYX06_etrs.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/switzerland/CHENYX06.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/australia/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/australia/National_84_02_07_01.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/australia/A66_National_13_09_01.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/100800401.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/peninsula.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/baleares.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/SPED2ETV2.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/germany/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/germany/BETA2007.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/germany/NTv2_SN.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/kshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/flhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wmhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/c2hpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ohhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/uthpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/imhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cnhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/lahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/arhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/iahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/tnhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nvhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nchpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ethpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ethpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/c1hpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ndhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/alhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cnhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/njhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wvhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/pvhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wyhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ohdhihpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/pahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wyhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mehpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/lahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nchpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/alhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/azhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/hihpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nvhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cohpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wthpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ilhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/schpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/flhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/okhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mohpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/okhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wohpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nbhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/sdhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/azhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/inhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ndhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wohpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mnhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wmhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/eshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mdhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/emhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/eshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/iahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wihpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mihpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nbhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/schpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/vahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nmhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/imhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mnhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/guhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wvhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/vahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/pahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/pvhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nmhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wihpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nyhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nehpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mohpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cohpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/kshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/kyhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nehpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/emhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ohhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/tnhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/guhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/njhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/kyhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mihpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/uthpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/c1hpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/gahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nyhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ohdhihpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wthpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/hihpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/arhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/sdhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/gahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mdhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ilhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/inhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/c2hpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mehpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/geoid/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/geoid/WGS84.img
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/inspire_cp_CadastralBoundary.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/esri_StatePlane_extra.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gt_ellips.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gt_datum.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/geoccs.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/vdv452.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/compdcs.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/netcdf_config.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57attributes.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gml_registry.xml
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57objectclasses_aml.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gdalvrt.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/inspire_cp_CadastralParcel.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/nitf_spec.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/esri_Wisconsin_extra.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ogrvrt.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57agencies.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/vdv452.xml
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57objectclasses_iw.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/seed_3d.dgn
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/GDALLogoBW.svg
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/prime_meridian.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57objectclasses.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/coordinate_axis.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/pci_datum.txt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ozi_datum.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/header.dxf
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/pcs.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/epsg.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gdal_datum.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/projop_wparm.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/pcs.override.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ecw_cs.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/inspire_cp_CadastralZoning.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gdalicon.png
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/GDALLogoColor.svg
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ruian_vf_ob_v1.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ruian_vf_st_v1.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/pci_ellips.txt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ruian_vf_st_uvoh_v1.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57expectedinput.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/stateplane.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gcs.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ruian_vf_v1.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/GDALLogoGS.svg
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ozi_ellips.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/osmconf.ini
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57attributes_aml.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/cubewerx_extra.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/trailer.dxf
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/esri_extra.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ellipsoid.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gcs.override.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/datum_shift.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/nitf_spec.xml
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/seed_2d.dgn
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/vertcs.override.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57attributes_iw.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/unit_of_measure.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/inspire_cp_BasicPropertyUnit.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/vertcs.csv
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/S-52x.stylx
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/S57DataDictionary.xml
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/news57.xml
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/asymrefpb.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/asymrefsb.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/psymrefs.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/lsymref.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/psymreft.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_bright.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_blackback.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_whiteback.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/dusk.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_bright.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_blackback.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/night.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/dusk.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/night.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_whiteback.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/ECDIS_settings.xml
arcgis-runtime-sdk-java-100.4.0/resources/symbols/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/app6b/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/app6b/app6b.stylx
arcgis-runtime-sdk-java-100.4.0/resources/symbols/mil2525c_b2/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/mil2525c_b2/mil2525c_b2.stylx
arcgis-runtime-sdk-java-100.4.0/resources/symbols/app6d/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/app6d/app6d.stylx
arcgis-runtime-sdk-java-100.4.0/resources/symbols/mil2525d/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/mil2525d/mil2525d.stylx
arcgis-runtime-sdk-java-100.4.0/samples/
arcgis-runtime-sdk-java-100.4.0/samples/arcgis-java-samples-v100.4.0.zip
display(dbutils.fs.ls("dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/"))
path name size
dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/LICENSE.txt LICENSE.txt 174.0
dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/README.txt README.txt 1980.0
dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/RELEASE-NOTES.txt RELEASE-NOTES.txt 5227.0
dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/jniLibs/ jniLibs/ 0.0
dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/legal/ legal/ 0.0
dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/libs/ libs/ 0.0
dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/resources/ resources/ 0.0
dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/samples/ samples/ 0.0

The library needs to be initialized running the following cell

if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
Initializing...
Java version : 1.8.0_282 (Azul Systems, Inc.) amd64

Read the data that needs to be transformed: in this case, osm location data is transformed.

spark.conf.set("spark.sql.parquet.binaryAsString", true)

val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
nodes_df.count()
res1: Long = 21212155
nodes_df.show(1,false)
+--------+-------+-------------+---------+---+--------+----------------------------+----------+------------------+
|id      |version|timestamp    |changeset|uid|user_sid|tags                        |latitude  |longitude         |
+--------+-------+-------------+---------+---+--------+----------------------------+----------+------------------+
|15389886|7      |1427965254000|0        |0  |        |[[highway, traffic_signals]]|54.7309125|25.239701200000003|
+--------+-------+-------------+---------+---+--------+----------------------------+----------+------------------+
only showing top 1 row

In this case, the coordinates are expressed in the WGS84 system and they will be projected into meters to be used with GeoMatch. To do this, one just need to change the code for each of the reference systems in the next function.

def project_to_meters(lon: Double, lat: Double): String = { 
    
    if(!ArcGISRuntimeEnvironment.isInitialized())
    {
      ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
      ArcGISRuntimeEnvironment.initialize() 
    }
  
    val initial_point = new Point(lon, lat, SpatialReference.create(4326))
    val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
    reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:Double, _:Double):String)
project_to_meters: (lon: Double, lat: Double)String
res2: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(DoubleType, DoubleType)))
val nodes_converted = nodes_df.selectExpr("id","latitude", "longitude", "project_to_meters(longitude, latitude) as new_coord")
nodes_converted.show(5,false)
+--------+------------------+------------------+---------------------------------------------------------------+
|id      |latitude          |longitude         |new_coord                                                      |
+--------+------------------+------------------+---------------------------------------------------------------+
|15389886|54.7309125        |25.239701200000003|Point: [5294624.872733, 3617234.130316, 0.000000, NaN] SR: 3035|
|15389895|54.732171400000006|25.243689500000002|Point: [5294845.235219, 3617425.427234, 0.000000, NaN] SR: 3035|
|15389899|54.7352788        |25.2467356        |Point: [5294962.370295, 3617805.661476, 0.000000, NaN] SR: 3035|
|15389959|54.7355529        |25.2458712        |Point: [5294901.580186, 3617823.871710, 0.000000, NaN] SR: 3035|
|15389961|54.735927100000005|25.245138800000003|Point: [5294846.689805, 3617854.789556, 0.000000, NaN] SR: 3035|
+--------+------------------+------------------+---------------------------------------------------------------+
only showing top 5 rows

nodes_converted: org.apache.spark.sql.DataFrame = [id: bigint, latitude: double ... 2 more fields]

Once the transformation is done, it is necessary to unpack the coordinates as follow

def unpack_lat(str: String): String = {
        val lat = str.replaceAll(",","").replaceAll("\\[","").split(" ")(2)
        return lat
}
spark.udf.register("unpack_lat", unpack_lat(_:String): String)

def unpack_lon(str: String): String = {
        val lon = str.replaceAll(",","").replaceAll("\\[","").split(" ")(1)
        return lon
}
spark.udf.register("unpack_lon", unpack_lon(_:String): String)
unpack_lat: (str: String)String
unpack_lon: (str: String)String
res5: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
val new_coordinates = nodes_converted.selectExpr("id as node_id", "unpack_lat(new_coord) as reprojected_lat", "unpack_lon(new_coord) as reprojected_lon")
new_coordinates: org.apache.spark.sql.DataFrame = [node_id: bigint, reprojected_lat: string ... 1 more field]

Now, the new coordinates are expressed in meters.

new_coordinates.show(5,false)
+--------+---------------+---------------+
|node_id |reprojected_lat|reprojected_lon|
+--------+---------------+---------------+
|15389886|3617234.130316 |5294624.872733 |
|15389895|3617425.427234 |5294845.235219 |
|15389899|3617805.661476 |5294962.370295 |
|15389959|3617823.871710 |5294901.580186 |
|15389961|3617854.789556 |5294846.689805 |
+--------+---------------+---------------+
only showing top 5 rows
val nodes_new_coordinates = nodes_df.join(new_croordinates, nodes_df.col("id") === new_coordinates.col("node_id")).selectExpr("id", "version", "timestamp", "changeset", "uid", "user_sid", "tags", "reprojected_lat as latitude", "reprojected_lon as longitude")
nodes_new_coordinates: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
nodes_new_coordinates.write.parquet("dbfs:/datasets/osm/lithuania/lithuania_nodes_converted.parquet")

ScaDaMaLe Course site and book

Segmentation of Lithuania by municipalities using Magellan

Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by UAB SENSMETRY through a Data Science Thesis Internship 
between 2022-01-17 and 2022-06-05 to Virginia J.M. and 
Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

Instructions

  1. Clone the Magellan repository from https://github.com/rahulbsw/magellan.git.
  2. Build the jar and get it into your local machine.
  3. In Databricks choose Create -> Library and upload the packaged jar.
  4. Create a spark 2.4.5 Scala 2.11 cluster with the uploaded Magellan library installed or if you are already running a cluster and installed the uploaded library to it you have to detach and re-attach any notebook currently using that cluster.
import magellan.Point 
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.magellan.dsl.expressions._
val toPointUDF = udf{(x:Double,y:Double) => Point(x,y) }
import magellan.Point
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.magellan.dsl.expressions._
toPointUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,org.apache.spark.sql.types.PointUDT@36ebe9ff,Some(List(DoubleType, DoubleType)))

After downloading the data (see lasts cells of the notebook), we expect to have the following files in distributed file system (dbfs):

  • LTcar_reprojected.csv is the file with the data crashes from LT.
  • municipalities.geojson is the geojson file containing LT municipalities.

First five lines or rows of the crash data containing: ID, Lon, Lat, timestamp

//sc.textFile("dbfs:/datasets/magellan/LTcar_reprojected.csv").take(1).foreach(println)

The output of the above command with IDs and locations anonymised is as follows:

id,latitude,longitude,timestamp
LT20xyABCDEF,55.xxxxxx,21.yyyyyy,20xy-mm-dd hh:20:00.000+01:00
case class CrashRecord(id: String, timestamp: String, point: Point)
defined class CrashRecord

Load accident data and transform latitude and longitude to Magellan's Point

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val crashes = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/datasets/magellan/LTcar_reprojected.csv").toDF()
val crashes_with_points = crashes.select(col("id"), col("timestamp"), col("longitude").cast(DoubleType), col("latitude").cast(DoubleType)).withColumn("point", toPointUDF($"longitude", $"latitude")).drop("latitude", "longitude").filter(col("timeStamp").isNotNull.as[CrashRecord])
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
crashes: org.apache.spark.sql.DataFrame = [id: string, latitude: double ... 2 more fields]
crashes_with_points: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: string, timestamp: timestamp ... 1 more field]
//crashes.show(1)

The output of the above command with IDs and locations anonymised is as follows:

+------------+---------+---------+-------------------+
|          id| latitude|longitude|          timestamp|
+------------+---------+---------+-------------------+
|LT20xyABCDEF|55.xxxxxx|21.yyyyyy|20xy-mm-dd hh:20:00|
//crashes_with_points.show(1,false)

The output of the above command with IDs and locations anonymised is as follows:

+------------+-------------------+---------------------------+
|id          |timestamp          |point                      |
+------------+-------------------+---------------------------+
|LT20xyABCDEF|20xy-mm-dd hh:20:00|Point(21.yyyyyy, 55.xxxxxx)|
val crashRecordCount = crashes_with_points.count() // how many crash records?
crashRecordCount: Long = 11945

The geojson format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

The name of the municipality in the metadata is "name" so let's keep only that one.

val municipalities = sqlContext.read.format("magellan")
                                   .option("type", "geojson")
                                   .load("dbfs:/datasets/magellan/municipalities.geojson")
                                   .filter($"polygon".isNotNull)
                                   .select($"polygon", $"metadata"("name") as "municipality")
municipalities: org.apache.spark.sql.DataFrame = [polygon: polygon, municipality: string]
municipalities.count()
res41: Long = 60
municipalities.show(100)
+--------------------+--------------------+
|             polygon|        municipality|
+--------------------+--------------------+
|magellan.Polygon@...|Visagino savivaldybė|
|magellan.Polygon@...|Ignalinos rajono ...|
|magellan.Polygon@...|Zarasų rajono sav...|
|magellan.Polygon@...|Vilkaviškio rajon...|
|magellan.Polygon@...|Šakių rajono savi...|
|magellan.Polygon@...|Utenos rajono sav...|
|magellan.Polygon@...|Švenčionių rajono...|
|magellan.Polygon@...|Šiaulių miesto sa...|
|magellan.Polygon@...|Panevėžio miesto ...|
|magellan.Polygon@...|Elektrėnų savival...|
|magellan.Polygon@...|Vilniaus miesto s...|
|magellan.Polygon@...|Marijampolės savi...|
|magellan.Polygon@...|Kazlų Rūdos saviv...|
|magellan.Polygon@...|Kalvarijos saviva...|
|magellan.Polygon@...|Kauno rajono savi...|
|magellan.Polygon@...|Vilniaus rajono s...|
|magellan.Polygon@...| Pagėgių savivaldybė|
|magellan.Polygon@...|Molėtų rajono sav...|
|magellan.Polygon@...|Anykščių rajono s...|
|magellan.Polygon@...|Klaipėdos miesto ...|
|magellan.Polygon@...|Šalčininkų rajono...|
|magellan.Polygon@...|Širvintų rajono s...|
|magellan.Polygon@...|Trakų rajono savi...|
|magellan.Polygon@...|Palangos miesto s...|
|magellan.Polygon@...|Kretingos rajono ...|
|magellan.Polygon@...|Ukmergės rajono s...|
|magellan.Polygon@...|Panevėžio rajono ...|
|magellan.Polygon@...|Kauno miesto savi...|
|magellan.Polygon@...|Druskininkų saviv...|
|magellan.Polygon@...|Varėnos rajono sa...|
|magellan.Polygon@...|Neringos savivaldybė|
|magellan.Polygon@...|Lazdijų rajono sa...|
|magellan.Polygon@...|Alytaus rajono sa...|
|magellan.Polygon@...|Alytaus miesto sa...|
|magellan.Polygon@...|Rokiškio rajono s...|
|magellan.Polygon@...|Biržų rajono savi...|
|magellan.Polygon@...|Kupiškio rajono s...|
|magellan.Polygon@...| Rietavo savivaldybė|
|magellan.Polygon@...|Pasvalio rajono s...|
|magellan.Polygon@...|Šilutės rajono sa...|
|magellan.Polygon@...|Skuodo rajono sav...|
|magellan.Polygon@...|Klaipėdos rajono ...|
|magellan.Polygon@...|Mažeikių rajono s...|
|magellan.Polygon@...|Pakruojo rajono s...|
|magellan.Polygon@...|Joniškio rajono s...|
|magellan.Polygon@...|Šiaulių rajono sa...|
|magellan.Polygon@...|Akmenės rajono sa...|
|magellan.Polygon@...|Radviliškio rajon...|
|magellan.Polygon@...|Kelmės rajono sav...|
|magellan.Polygon@...|Prienų rajono sav...|
|magellan.Polygon@...|Plungės rajono sa...|
|magellan.Polygon@...|Telšių rajono sav...|
|magellan.Polygon@...|Jonavos rajono sa...|
|magellan.Polygon@...|Raseinių rajono s...|
|magellan.Polygon@...|Tauragės rajono s...|
|magellan.Polygon@...|Kaišiadorių rajon...|
|magellan.Polygon@...|Šilalės rajono sa...|
|magellan.Polygon@...|Kėdainių rajono s...|
|magellan.Polygon@...|Jurbarko rajono s...|
|magellan.Polygon@...|Birštono savivaldybė|
+--------------------+--------------------+
//If we have the same coordinates system, next cell should not be empty
//The geojson file are presented in the WGS84 coordinate system

Join the accidents with the municipalities.

val joined = municipalities
            .join(crashes_with_points)
            .where($"point" within $"polygon")
            .select($"id", $"timestamp", $"municipality", $"point")
joined: org.apache.spark.sql.DataFrame = [id: string, timestamp: timestamp ... 2 more fields]
//joined.show(1,false)

The output of the above command with IDs and locations anonymised is as follows:

+------------+-------------------+--------------------+---------------------------+
|id          |timestamp          |municipality        |point                      |
+------------+-------------------+--------------------+---------------------------+
|LT20xyABCDEF|2019-09-08 20:10:00|Visagino savivaldybė|Point(26.xxxxxx, 55.yyyyy) |
val crashes_in_municipalities = joined.count() 
crashes_in_municipalities: Long = 11937
crashRecordCount - crashes_in_municipalities // records not in the neighbourhood geojson file
res45: Long = 8
val municipality_count = joined
  .groupBy($"municipality")
  .agg(countDistinct("id").as("acc_count"))
  .orderBy(col("acc_count").desc)

municipality_count.show(5,false)
+----------------------------+---------+
|municipality                |acc_count|
+----------------------------+---------+
|Vilniaus miesto savivaldybė |2356     |
|Kauno miesto savivaldybė    |1461     |
|Klaipėdos miesto savivaldybė|733      |
|Panevėžio miesto savivaldybė|592      |
|Šiaulių miesto savivaldybė  |468      |
+----------------------------+---------+
only showing top 5 rows

municipality_count: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [municipality: string, acc_count: bigint]
val municipality_count_freq = municipality_count.withColumn("frequency", col("acc_count")/crashes_in_municipalities)
municipality_count_freq.show(10,false)
+----------------------------+---------+--------------------+
|municipality                |acc_count|frequency           |
+----------------------------+---------+--------------------+
|Vilniaus miesto savivaldybė |2356     |0.19736952333082014 |
|Kauno miesto savivaldybė    |1461     |0.12239256094496105 |
|Klaipėdos miesto savivaldybė|733      |0.061405713328306945|
|Panevėžio miesto savivaldybė|592      |0.04959370025969674 |
|Šiaulių miesto savivaldybė  |468      |0.039205830610706205|
|Vilniaus rajono savivaldybė |430      |0.03602245120214459 |
|Kauno rajono savivaldybė    |347      |0.02906928038870738 |
|Klaipėdos rajono savivaldybė|289      |0.02421043813353439 |
|Panevėžio rajono savivaldybė|280      |0.02345647985255927 |
|Šiaulių rajono savivaldybė  |214      |0.01792745245874173 |
+----------------------------+---------+--------------------+
only showing top 10 rows

municipality_count_freq: org.apache.spark.sql.DataFrame = [municipality: string, acc_count: bigint ... 1 more field]
municipality_count_freq.select("municipality","frequency").write.format("csv").option("header", true).save("dbfs:/datasets/lithuania/municipalities_freq.csv")

Download most updated population data from https://www.registrucentras.lt/p/853 and upload it

val municipality_pop = spark.read.format("csv").option("delimiter",";").option("header", "true").option("inferSchema", "true").load("dbfs:/datasets/lithuania/population.csv").toDF()
municipality_pop: org.apache.spark.sql.DataFrame = [municipality: string, population: int]
municipality_pop.show()
+--------------------+----------+
|        municipality|population|
+--------------------+----------+
|Akmenės rajono sa...|     20597|
|Alytaus miesto sa...|     53920|
|Alytaus rajono sa...|     28170|
|Anykščių rajono s...|     24619|
|Birštono savivaldybė|      4425|
|Biržų rajono savi...|     25141|
|Druskininkų saviv...|     21282|
|Elektrėnų savival...|     25903|
|Ignalinos rajono ...|     15495|
|Jonavos rajono sa...|     43564|
|Joniškio rajono s...|     22234|
|Jurbarko rajono s...|     27145|
|Kaišiadorių rajon...|     29746|
|Kalvarijos saviva...|     10737|
|Kauno miesto savi...|    313503|
|Kauno rajono savi...|    105032|
|Kazlų Rūdos saviv...|     11621|
|Kelmės rajono sav...|     27513|
|Klaipėdos miesto ...|    165710|
|Klaipėdos rajono ...|     67232|
+--------------------+----------+
only showing top 20 rows
val municipality_count_pop = municipality_count.join(municipality_pop, municipality_count.col("municipality") === municipality_pop.col("municipality")).withColumn("acc_by_pop", col("acc_count")/col("population")).drop(municipality_pop.col("municipality"))
municipality_count_pop: org.apache.spark.sql.DataFrame = [municipality: string, acc_count: bigint ... 2 more fields]
municipality_count_pop.show()
+--------------------+---------+----------+--------------------+
|        municipality|acc_count|population|          acc_by_pop|
+--------------------+---------+----------+--------------------+
|Vilniaus miesto s...|     2356|    592389|0.003977116388049069|
|Kauno miesto savi...|     1461|    313503| 0.00466024248571784|
|Klaipėdos miesto ...|      733|    165710|0.004423390260092...|
|Panevėžio miesto ...|      592|     91221|0.006489733723594348|
|Šiaulių miesto sa...|      468|    111289| 0.00420526736694552|
|Vilniaus rajono s...|      430|    108948|0.003946837023167015|
|Kauno rajono savi...|      347|    105032|0.003303755046081194|
|Klaipėdos rajono ...|      289|     67232|0.004298548310328415|
|Panevėžio rajono ...|      280|     38639|0.007246564352079505|
|Šiaulių rajono sa...|      214|     43923|0.004872162648270837|
|Šilutės rajono sa...|      196|     42330|0.004630285849279471|
|Plungės rajono sa...|      185|     35804|0.005167020444643056|
|Raseinių rajono s...|      175|     32598|0.005368427510890238|
|Telšių rajono sav...|      170|     42883|0.003964274887484551|
|Jonavos rajono sa...|      168|     43564|0.003856395188687...|
|Kėdainių rajono s...|      167|     49360|0.003383306320907...|
|Tauragės rajono s...|      165|     41256|0.003999418266433973|
|Marijampolės savi...|      162|     57937| 0.00279614063551789|
|Alytaus miesto sa...|      161|     53920|0.002985905044510386|
|Trakų rajono savi...|      160|     35864|0.004461298237787196|
+--------------------+---------+----------+--------------------+
only showing top 20 rows
municipality_count_pop.select("municipality","acc_by_pop").write.format("csv").option("header", true).save("dbfs:/datasets/lithuania/municipalities_pop.csv")

Step 0: Downloading datasets and load into dbfs

  • get the accident data
  • get the Lithuanian municipality data
dbutils.fs.cp("dbfs:/FileStore/tables/ltcar_reprojected.csv", "dbfs:/datasets/magellan/LTcar_reprojected.csv")
res5: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/magellan/"))
path name size
dbfs:/datasets/magellan/LT_adm/ LT_adm/ 0.0
dbfs:/datasets/magellan/LTbhd/ LTbhd/ 0.0
dbfs:/datasets/magellan/LTcar_locations.csv LTcar_locations.csv 706938.0
dbfs:/datasets/magellan/LTcar_reprojected.csv LTcar_reprojected.csv 752891.0
dbfs:/datasets/magellan/SFNbhd/ SFNbhd/ 0.0
dbfs:/datasets/magellan/all.tsv all.tsv 6.0947802e7

Getting Lithuanian Administrative Divisions Data

Second-level Administrative Divisions, Lithuania, 2015

Data from https://github.com/seporaitis/lt-geojson

wget https://raw.githubusercontent.com/seporaitis/lt-geojson/master/geojson/municipalities.geojson
--2022-04-18 15:00:50--  https://raw.githubusercontent.com/seporaitis/lt-geojson/master/geojson/municipalities.geojson
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9590686 (9.1M) [text/plain]
Saving to: ‘municipalities.geojson’

     0K .......... .......... .......... .......... ..........  0% 4.35M 2s
    50K .......... .......... .......... .......... ..........  1% 4.09M 2s
   100K .......... .......... .......... .......... ..........  1% 4.50M 2s
   150K .......... .......... .......... .......... ..........  2% 18.0M 2s
   200K .......... .......... .......... .......... ..........  2% 34.7M 1s
   250K .......... .......... .......... .......... ..........  3% 7.08M 1s
   300K .......... .......... .......... .......... ..........  3% 45.2M 1s
   350K .......... .......... .......... .......... ..........  4% 31.9M 1s
   400K .......... .......... .......... .......... ..........  4% 59.6M 1s
   450K .......... .......... .......... .......... ..........  5% 32.0M 1s
   500K .......... .......... .......... .......... ..........  5% 10.7M 1s
   550K .......... .......... .......... .......... ..........  6% 19.9M 1s
   600K .......... .......... .......... .......... ..........  6% 67.7M 1s
   650K .......... .......... .......... .......... ..........  7% 80.4M 1s
   700K .......... .......... .......... .......... ..........  8% 79.2M 1s
   750K .......... .......... .......... .......... ..........  8% 96.1M 1s
   800K .......... .......... .......... .......... ..........  9%  238M 1s
   850K .......... .......... .......... .......... ..........  9% 81.1M 1s
   900K .......... .......... .......... .......... .......... 10% 80.0M 1s
   950K .......... .......... .......... .......... .......... 10% 75.5M 1s
  1000K .......... .......... .......... .......... .......... 11% 12.4M 1s
  1050K .......... .......... .......... .......... .......... 11% 87.6M 0s
  1100K .......... .......... .......... .......... .......... 12%  118M 0s
  1150K .......... .......... .......... .......... .......... 12% 27.7M 0s
  1200K .......... .......... .......... .......... .......... 13%  164M 0s
  1250K .......... .......... .......... .......... .......... 13%  239M 0s
  1300K .......... .......... .......... .......... .......... 14% 95.0M 0s
  1350K .......... .......... .......... .......... .......... 14% 91.8M 0s
  1400K .......... .......... .......... .......... .......... 15%  114M 0s
  1450K .......... .......... .......... .......... .......... 16%  160M 0s
  1500K .......... .......... .......... .......... .......... 16%  203M 0s
  1550K .......... .......... .......... .......... .......... 17%  167M 0s
  1600K .......... .......... .......... .......... .......... 17% 74.5M 0s
  1650K .......... .......... .......... .......... .......... 18% 73.9M 0s
  1700K .......... .......... .......... .......... .......... 18% 60.7M 0s
  1750K .......... .......... .......... .......... .......... 19% 59.0M 0s
  1800K .......... .......... .......... .......... .......... 19%  142M 0s
  1850K .......... .......... .......... .......... .......... 20%  235M 0s
  1900K .......... .......... .......... .......... .......... 20%  141M 0s
  1950K .......... .......... .......... .......... .......... 21%  138M 0s
  2000K .......... .......... .......... .......... .......... 21%  214M 0s
  2050K .......... .......... .......... .......... .......... 22% 31.2M 0s
  2100K .......... .......... .......... .......... .......... 22% 85.9M 0s
  2150K .......... .......... .......... .......... .......... 23% 91.3M 0s
  2200K .......... .......... .......... .......... .......... 24% 96.5M 0s
  2250K .......... .......... .......... .......... .......... 24% 88.7M 0s
  2300K .......... .......... .......... .......... .......... 25% 58.4M 0s
  2350K .......... .......... .......... .......... .......... 25% 59.8M 0s
  2400K .......... .......... .......... .......... .......... 26%  137M 0s
  2450K .......... .......... .......... .......... .......... 26%  123M 0s
  2500K .......... .......... .......... .......... .......... 27%  117M 0s
  2550K .......... .......... .......... .......... .......... 27%  107M 0s
  2600K .......... .......... .......... .......... .......... 28%  135M 0s
  2650K .......... .......... .......... .......... .......... 28% 47.2M 0s
  2700K .......... .......... .......... .......... .......... 29% 83.0M 0s
  2750K .......... .......... .......... .......... .......... 29%  102M 0s
  2800K .......... .......... .......... .......... .......... 30%  151M 0s
  2850K .......... .......... .......... .......... .......... 30%  135M 0s
  2900K .......... .......... .......... .......... .......... 31% 68.8M 0s
  2950K .......... .......... .......... .......... .......... 32% 59.0M 0s
  3000K .......... .......... .......... .......... .......... 32% 4.71M 0s
  3050K .......... .......... .......... .......... .......... 33% 92.0M 0s
  3100K .......... .......... .......... .......... .......... 33% 96.4M 0s
  3150K .......... .......... .......... .......... .......... 34% 69.4M 0s
  3200K .......... .......... .......... .......... .......... 34% 81.6M 0s
  3250K .......... .......... .......... .......... .......... 35% 70.3M 0s
  3300K .......... .......... .......... .......... .......... 35% 43.2M 0s
  3350K .......... .......... .......... .......... .......... 36% 29.4M 0s
  3400K .......... .......... .......... .......... .......... 36% 40.3M 0s
  3450K .......... .......... .......... .......... .......... 37% 33.6M 0s
  3500K .......... .......... .......... .......... .......... 37% 34.9M 0s
  3550K .......... .......... .......... .......... .......... 38% 30.7M 0s
  3600K .......... .......... .......... .......... .......... 38% 32.4M 0s
  3650K .......... .......... .......... .......... .......... 39% 35.3M 0s
  3700K .......... .......... .......... .......... .......... 40% 33.2M 0s
  3750K .......... .......... .......... .......... .......... 40% 38.5M 0s
  3800K .......... .......... .......... .......... .......... 41% 78.5M 0s
  3850K .......... .......... .......... .......... .......... 41% 81.5M 0s
  3900K .......... .......... .......... .......... .......... 42% 95.4M 0s
  3950K .......... .......... .......... .......... .......... 42% 91.4M 0s
  4000K .......... .......... .......... .......... .......... 43%  133M 0s
  4050K .......... .......... .......... .......... .......... 43%  147M 0s
  4100K .......... .......... .......... .......... .......... 44%  116M 0s
  4150K .......... .......... .......... .......... .......... 44%  113M 0s
  4200K .......... .......... .......... .......... .......... 45%  139M 0s
  4250K .......... .......... .......... .......... .......... 45%  138M 0s
  4300K .......... .......... .......... .......... .......... 46%  139M 0s
  4350K .......... .......... .......... .......... .......... 46%  122M 0s
  4400K .......... .......... .......... .......... .......... 47%  144M 0s
  4450K .......... .......... .......... .......... .......... 48%  131M 0s
  4500K .......... .......... .......... .......... .......... 48%  148M 0s
  4550K .......... .......... .......... .......... .......... 49% 44.5M 0s
  4600K .......... .......... .......... .......... .......... 49% 83.8M 0s
  4650K .......... .......... .......... .......... .......... 50%  132M 0s
  4700K .......... .......... .......... .......... .......... 50%  144M 0s
  4750K .......... .......... .......... .......... .......... 51% 86.2M 0s
  4800K .......... .......... .......... .......... .......... 51% 72.0M 0s
  4850K .......... .......... .......... .......... .......... 52% 78.3M 0s
  4900K .......... .......... .......... .......... .......... 52% 99.6M 0s
  4950K .......... .......... .......... .......... .......... 53% 70.1M 0s
  5000K .......... .......... .......... .......... .......... 53% 83.3M 0s
  5050K .......... .......... .......... .......... .......... 54% 78.1M 0s
  5100K .......... .......... .......... .......... .......... 54% 98.5M 0s
  5150K .......... .......... .......... .......... .......... 55% 91.9M 0s
  5200K .......... .......... .......... .......... .......... 56%  142M 0s
  5250K .......... .......... .......... .......... .......... 56% 98.5M 0s
  5300K .......... .......... .......... .......... .......... 57% 86.3M 0s
  5350K .......... .......... .......... .......... .......... 57% 78.9M 0s
  5400K .......... .......... .......... .......... .......... 58%  138M 0s
  5450K .......... .......... .......... .......... .......... 58%  144M 0s
  5500K .......... .......... .......... .......... .......... 59% 92.9M 0s
  5550K .......... .......... .......... .......... .......... 59%  122M 0s
  5600K .......... .......... .......... .......... .......... 60%  138M 0s
  5650K .......... .......... .......... .......... .......... 60%  154M 0s
  5700K .......... .......... .......... .......... .......... 61% 6.75M 0s
  5750K .......... .......... .......... .......... .......... 61% 52.2M 0s
  5800K .......... .......... .......... .......... .......... 62%  106M 0s
  5850K .......... .......... .......... .......... .......... 62% 60.6M 0s
  5900K .......... .......... .......... .......... .......... 63%  127M 0s
  5950K .......... .......... .......... .......... .......... 64%  124M 0s
  6000K .......... .......... .......... .......... .......... 64%  140M 0s
  6050K .......... .......... .......... .......... .......... 65%  149M 0s
  6100K .......... .......... .......... .......... .......... 65%  145M 0s
  6150K .......... .......... .......... .......... .......... 66%  140M 0s
  6200K .......... .......... .......... .......... .......... 66%  176M 0s
  6250K .......... .......... .......... .......... .......... 67%  150M 0s
  6300K .......... .......... .......... .......... .......... 67% 62.8M 0s
  6350K .......... .......... .......... .......... .......... 68% 58.0M 0s
  6400K .......... .......... .......... .......... .......... 68% 39.4M 0s
  6450K .......... .......... .......... .......... .......... 69% 37.8M 0s
  6500K .......... .......... .......... .......... .......... 69% 36.9M 0s
  6550K .......... .......... .......... .......... .......... 70% 33.7M 0s
  6600K .......... .......... .......... .......... .......... 71% 39.3M 0s
  6650K .......... .......... .......... .......... .......... 71% 37.6M 0s
  6700K .......... .......... .......... .......... .......... 72% 38.0M 0s
  6750K .......... .......... .......... .......... .......... 72% 34.2M 0s
  6800K .......... .......... .......... .......... .......... 73% 36.5M 0s
  6850K .......... .......... .......... .......... .......... 73% 36.0M 0s
  6900K .......... .......... .......... .......... .......... 74% 36.5M 0s
  6950K .......... .......... .......... .......... .......... 74% 34.3M 0s
  7000K .......... .......... .......... .......... .......... 75% 41.6M 0s
  7050K .......... .......... .......... .......... .......... 75% 37.5M 0s
  7100K .......... .......... .......... .......... .......... 76% 36.0M 0s
  7150K .......... .......... .......... .......... .......... 76% 26.1M 0s
  7200K .......... .......... .......... .......... .......... 77% 38.3M 0s
  7250K .......... .......... .......... .......... .......... 77% 37.2M 0s
  7300K .......... .......... .......... .......... .......... 78% 36.5M 0s
  7350K .......... .......... .......... .......... .......... 79% 13.5M 0s
  7400K .......... .......... .......... .......... .......... 79% 34.9M 0s
  7450K .......... .......... .......... .......... .......... 80% 38.6M 0s
  7500K .......... .......... .......... .......... .......... 80% 39.5M 0s
  7550K .......... .......... .......... .......... .......... 81% 37.1M 0s
  7600K .......... .......... .......... .......... .......... 81% 40.0M 0s
  7650K .......... .......... .......... .......... .......... 82% 36.9M 0s
  7700K .......... .......... .......... .......... .......... 82% 42.2M 0s
  7750K .......... .......... .......... .......... .......... 83% 38.9M 0s
  7800K .......... .......... .......... .......... .......... 83% 45.4M 0s
  7850K .......... .......... .......... .......... .......... 84% 46.0M 0s
  7900K .......... .......... .......... .......... .......... 84% 40.2M 0s
  7950K .......... .......... .......... .......... .......... 85% 39.9M 0s
  8000K .......... .......... .......... .......... .......... 85% 45.1M 0s
  8050K .......... .......... .......... .......... .......... 86% 45.3M 0s
  8100K .......... .......... .......... .......... .......... 87% 44.6M 0s
  8150K .......... .......... .......... .......... .......... 87% 79.8M 0s
  8200K .......... .......... .......... .......... .......... 88%  120M 0s
  8250K .......... .......... .......... .......... .......... 88%  118M 0s
  8300K .......... .......... .......... .......... .......... 89%  118M 0s
  8350K .......... .......... .......... .......... .......... 89%  103M 0s
  8400K .......... .......... .......... .......... .......... 90%  118M 0s
  8450K .......... .......... .......... .......... .......... 90% 96.8M 0s
  8500K .......... .......... .......... .......... .......... 91%  102M 0s
  8550K .......... .......... .......... .......... .......... 91% 87.7M 0s
  8600K .......... .......... .......... .......... .......... 92%  107M 0s
  8650K .......... .......... .......... .......... .......... 92%  123M 0s
  8700K .......... .......... .......... .......... .......... 93%  116M 0s
  8750K .......... .......... .......... .......... .......... 93%  111M 0s
  8800K .......... .......... .......... .......... .......... 94%  116M 0s
  8850K .......... .......... .......... .......... .......... 95%  190M 0s
  8900K .......... .......... .......... .......... .......... 95% 87.6M 0s
  8950K .......... .......... .......... .......... .......... 96%  100M 0s
  9000K .......... .......... .......... .......... .......... 96%  116M 0s
  9050K .......... .......... .......... .......... .......... 97%  123M 0s
  9100K .......... .......... .......... .......... .......... 97%  126M 0s
  9150K .......... .......... .......... .......... .......... 98%  210M 0s
  9200K .......... .......... .......... .......... .......... 98%  237M 0s
  9250K .......... .......... .......... .......... .......... 99%  242M 0s
  9300K .......... .......... .......... .......... .......... 99%  240M 0s
  9350K .......... .....                                      100%  146M=0.2s

2022-04-18 15:00:51 (44.9 MB/s) - ‘municipalities.geojson’ saved [9590686/9590686]
# Reading and processing geojson. Removing @relations (fails for some reason and not needed)

import json

# municipalities / Savivaldybės
municipalities = json.load(open("municipalities.geojson", 'r'))

list_to_remove = []
i = 0
for feature in municipalities['features']:
  municipalities['features'][i]["properties"].pop("relations", None)
  municipalities['features'][i]["properties"].pop("@relations", None)

  i+=1
  
for feature in municipalities['features']:
  for property in feature["properties"]:
    print(property)
    
with open("municipalities.geojson", 'w') as outfile:
    json.dump(municipalities, outfile)
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:de
name:en
name:fi
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
is_in:country_code
name
name:be
name:be-tarask
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
website
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:ca
name:de
name:en
name:fi
name:fr
name:it
name:lt
name:lv
name:nn
name:pl
name:ru
name:sco
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:en
name:lt
name:lv
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:ca
name:de
name:en
name:fr
name:it
name:lt
name:lv
name:pl
name:ru
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:de
name:el
name:en
name:fi
name:fr
name:it
name:lt
name:lv
name:nl
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:ca
name:de
name:en
name:eo
name:es
name:fi
name:fr
name:it
name:ka
name:lt
name:lv
name:pl
name:ru
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
is_in:country_code
name
name:de
name:fi
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:en
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:en
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
is_in:country_code
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:et
name:lt
name:pl
name:ru
type
website
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:de
name:en
name:fi
name:fr
name:it
name:ka
name:lt
name:lv
name:nl
name:no
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
source
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:en
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:en
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:ca
name:de
name:en
name:es
name:fi
name:fr
name:it
name:ka
name:lt
name:lv
name:pl
name:ru
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:lv
name:pl
name:ru
name:ur
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:ca
name:de
name:es
name:et
name:fi
name:fr
name:he
name:it
name:ka
name:lmo
name:lt
name:lv
name:pl
name:ru
name:ur
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:el
name:en
name:et
name:fi
name:fr
name:it
name:lt
name:lv
name:nl
name:no
name:pl
name:ru
type
wikidata
wikipedia
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
dbutils.fs.cp("file:/databricks/driver/municipalities.geojson", "dbfs:/datasets/magellan/")
res36: Boolean = true

ScaDaMaLe Course site and book

Visualization of the Segmentation by municipalities using Python.

Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).

This project was supported by UAB SENSMETRY through a Data Science Thesis Internship 
between 2022-01-17 and 2022-06-05 to Virginia J.M. and 
Databricks University Alliance with infrastructure credits from AWS to 
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.

2022, Uppsala, Sweden

# Reading accident frequencies for each municipality previously obtained

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StringType, DoubleType

schema = StructType() \
      .add("municipality", StringType(), True) \
      .add("frequency", DoubleType(), True)

municipality_freq = spark.read.format("csv").option("header", True).schema(schema).load("dbfs:/datasets/lithuania/municipalities_freq.csv")
municipality_freq.show(1000)
+--------------------+--------------------+
|        municipality|           frequency|
+--------------------+--------------------+
|Kaišiadorių rajon...|0.010304096506659964|
|Kelmės rajono sav...|0.010304096506659964|
|Pakruojo rajono s...|0.003853564547206...|
|Skuodo rajono sav...|0.003853564547206...|
|Elektrėnų savival...|0.004356203401189578|
|Kazlų Rūdos saviv...|0.004356203401189578|
|Neringos savivaldybė|0.001507916561950...|
|Birštono savivaldybė|0.001507916561950...|
|Šalčininkų rajono...|0.009047499371701432|
|Švenčionių rajono...|0.005277707966825836|
|Radviliškio rajon...|0.011979559353271342|
|Vilkaviškio rajon...|0.008628633660048589|
|Širvintų rajono s...|0.003518471977883...|
|Klaipėdos miesto ...|0.061405713328306945|
|Panevėžio miesto ...| 0.04959370025969674|
|Panevėžio rajono ...| 0.02345647985255927|
|Kėdainių rajono s...|0.013990114769204993|
|Mažeikių rajono s...|0.013236156488229874|
|Anykščių rajono s...| 0.00636675881712323|
|Šiaulių miesto sa...|0.039205830610706205|
|Klaipėdos rajono ...| 0.02421043813353439|
|Šilutės rajono sa...|0.016419535896791487|
|Raseinių rajono s...|0.014660299907849544|
|Tauragės rajono s...|0.013822568484543855|
|Rokiškio rajono s...|0.008293541090726313|
|Ukmergės rajono s...|0.008042221663734606|
|Šilalės rajono sa...|0.006199212532462093|
|Molėtų rajono sav...|0.005612800536148...|
|Joniškio rajono s...|0.005445254251486974|
|Kupiškio rajono s...|0.005193934824495267|
|Akmenės rajono sa...|0.004272430258859...|
|Ignalinos rajono ...|0.003602245120214459|
|Šiaulių rajono sa...| 0.01792745245874173|
|Plungės rajono sa...|0.015498031331155232|
|Telšių rajono sav...|0.014241434196196699|
|Kretingos rajono ...| 0.01130937421462679|
|Pasvalio rajono s...|0.010806735360643378|
|Palangos miesto s...|0.010387869648990534|
|Varėnos rajono sa...|0.007874675379073468|
|Lazdijų rajono sa...|0.004775069112842423|
|Jurbarko rajono s...|0.003937337689536734|
|Zarasų rajono sav...|0.002680740554578...|
|Vilniaus miesto s...| 0.19736952333082014|
|Vilniaus rajono s...| 0.03602245120214459|
|Jonavos rajono sa...|0.014073887911535563|
|Prienų rajono sav...|0.009382591941023708|
|Šakių rajono savi...|0.008963726229370864|
|Biržų rajono savi...|0.006701851386445506|
|Alytaus miesto sa...| 0.01348747591522158|
|Trakų rajono savi...|0.013403702772891012|
|Alytaus rajono sa...| 0.01206333249560191|
|Utenos rajono sav...|0.010555415933651672|
|Druskininkų saviv...|0.002596967412247...|
|Marijampolės savi...| 0.01357124905755215|
|Kauno miesto savi...| 0.12239256094496105|
|Kauno rajono savi...| 0.02906928038870738|
|Kalvarijos saviva...|0.002261874842925358|
| Pagėgių savivaldybė|0.001424143419619...|
|Visagino savivaldybė|0.002513194269917...|
| Rietavo savivaldybė|0.003183379408561615|
+--------------------+--------------------+
# Calculating colors

# https://matplotlib.org/stable/tutorials/colors/colormaps.html
from matplotlib.cm import viridis
from matplotlib.colors import to_hex

min_freq = municipality_freq.agg({"frequency":"min"}).collect()[0][0]
max_freq = municipality_freq.agg({"frequency":"max"}).collect()[0][0]
freq_range = max_freq - min_freq

def calculate_color(row):
    freq = row["frequency"]
    """
    Convert the freq to a color
    """
    # make freq a number between 0 and 1
    normalized_freq = (freq - min_freq) / freq_range
    
    # This is because in viridis colormap, darker is lower values and we want the opposite
    inverse_freq = 1-normalized_freq
    
    # transform the freq coefficient to a matplotlib color
    mpl_color = viridis(inverse_freq)

    # transform from a matplotlib color to a valid CSS color
    gmaps_color = to_hex(mpl_color, keep_alpha=False)

    return (row["municipality"],gmaps_color)

# Calculate a color for each district
colors = municipality_freq.rdd.map(lambda row: calculate_color(row)).collectAsMap()
//Temporary copy of geojson so python can read it
dbutils.fs.cp("dbfs:/datasets/magellan/municipalities.geojson", "file:/databricks/driver/")
res0: Boolean = true
# Reading and processing geojson (map and borders)

import json
import gmaps
import gmaps.datasets
import gmaps.geojson_geometries
from ipywidgets.embed import embed_minimal_html

gmaps.configure(api_key="AIzaSyDEHHgMMS33M5AT8lav2Q-sem5KOyFx9Sc") # Your Google API key

# municipalities / Savivaldybės
municipalities = json.load(open('municipalities.geojson', 'r'))

# Removing municipality capitals
list_to_remove = []
i = 0
for feature in municipalities['features']:
  if feature["geometry"]["type"] != "Polygon":
    list_to_remove.append(i)
  i+=1
  
# Removing what was found before
for index in sorted(list_to_remove, reverse=True):
    del municipalities['features'][index]
    
# Order the colors by the geojson order

ordered_colors = []
for feature in municipalities['features']:
  municipality = feature['properties']['name']
  color = colors[municipality]
  ordered_colors.append(color)
from pylab import *

# Generating map

fig = gmaps.figure()
freq_layer = gmaps.geojson_layer(
    municipalities,
    fill_color=ordered_colors,
    fill_opacity=0.8,
    stroke_color='black',
    stroke_opacity=1.0,
    stroke_weight=0.2)
fig.add_layer(freq_layer)

embed_minimal_html("export.html", views=[fig])
# Adding color legend to map
cmap = cm.get_cmap('viridis', 20)

gradient = ""
for i in reversed(range(cmap.N)):
    rgba = cmap(i)
    # rgb2hex accepts rgb or rgba
    gradient = gradient + "," + matplotlib.colors.rgb2hex(rgba)

# Removing first comma
gradient = gradient[1:]

html_file_content = open("export.html", 'r').read()\
                    .replace("</head>", """<style>
                                 .legend {
                                   max-width: 430px;
                                 }
                                  .legend div{
                                   background: linear-gradient(to right, """ + gradient + """);
                                   border-radius: 4px;
                                   padding: 10px;
                                 }

                                .legend p {
                                  text-align: justify;
                                  text-justify: inter-word;
                                  margin: 0px;
                                      margin-block-start: 0em;
                                    margin-block-end: 0em;
                                    height: 1em;
                                }
                                .legend p:after {
                                  content: "";
                                  display: inline-block;
                                  width: 100%;
                                }
                              </style>
                            </head>""")\
                    .replace("</body>","""
                          <h2>Relative frequency of accidents</h2>
                          <div class="legend">
                            <p>""" + str(round(min_freq,2)) + " " + str(round(max_freq,2)) +"""</p>
                            <div></div>
                          </div>
                        </body>""")
# !!!!!!!!!!!!!!!!!!!!!
# Can only be run once per cluster restart
displayHTML(html_file_content)
IPyWidget export